Proceedings of Vccc'08[1]

Proceedings of National Conference on

VLSI for Communication, Computation and

Control

VCCC’ 08

15th March

Editors Mrs. C. Kezi Selva Vijila

(Head, Department of ECE) Mrs. G. Josemin Bala

(Assistant Professor, Department of ECE )

Organized by

Department of Electronics and Communication Engineering

KARUNYA UNIVERSITY (Declared as Deemed to be University under Sec. 3 of the UGC Act,1956)

Coimbatore, Tamilnadu.

NATIONAL CONFERENCE ON VLSI FOR

COMMUNICATION, COMPUTATION AND CONTROL (VCCC’ 08)

PATRON Dr. Paul Dhinakaran Chancellor, Karunya University, Coimbatore ADVISORY COMMITTEE Dr. S. Arumugam Additional Director, DOTE, Chennai. Dr. V. Palaniswami Principal, GCT, Coimbatore. Dr. A. Ebenezer Jeyakumar Principal, GCE, Salem. Dr.E. Kirubakaran BHEL, Tiruchirappalli. ORGANISING COMMITTEE Chairman : Dr. Paul P. Appasamy Vice Chancellor, Karunya University, Coimbatore Vice Chairman : Dr. Anne Mary Fernandez Registrar, Karunya University, Coimbatore Convenor : Mrs.C.Kezi Selva Vijila, HOD, Department of ECE. Karunya University, Coimbatore Co-Convenor : Mrs. G. Josemin Bala Asst.Proffessor, Department of ECE. Karunya University, Coimbatore

MEMBERS : Prof.K.Palaniswami Dr. Easter Selvan Ms. Shanthini Pandiaraj Mr. Albert Rajan Mr. Shanty Chacko Mr. Abraham Chandy Mr. Karthigai Kumar Mrs. Nesasudha Ms. Rahimunnisa Mr. Jude Hemanth LOCAL ORGANISING MEMBERS :

Mrs.D.Jackuline Moni Mrs.D.Synthia Mrs.T.Anita JonesMary Mr.D.Sugumar Mrs.S.Sridevi Sathyapriya Mrs.J.Anitha Mr.N.Satheesh Kumar Mrs.D.S.Shylu Mrs.Jennifer S. Raj Mr.S.Immanuel Alex Ms.S.Sherine Ms.K.Prescilla Ms.J.Grace Jency Gananammal Ms.G.Christina Ms.F.Agi Lydia Prizzi Mr.J.Samuel Manoharan Mr.S.Smys Mr.D.Narain Ponraj Mr.D.Nirmal Ms.B.Manjurathi Mrs.G.Shine Let Ms.Linda paul Ms.Cynthia Hubert Mr.A.Satheesh Mr.P.Muthukrishnan Ms.Anu Merya Philip Mr.Jaganath M.ReebaRex Mr.T.Retnam Mr.B.Jeyachandran Mr.Arul Rajkumar Mr.C.R.Jeyaseelan Mr.Wilson Christopher Mr.Manohar Livingston MrJ.Jebavaram

EDITORIAL TEAM: Editors : Mrs.C. Kezi SelvaVijila (Head, Department of ECE) Mrs.G. Josemin Bala (Assistant Professor, Department of ECE ) Staff Co-Ordinators : Mrs.K.Rahimunnisa (Senior Lecturer, Department of ECE ) Mr.A. Amir Anton Jone (Lecturer, Department of ECE ) Student Co-Ordinators : II ME - Tintu Mol IV Yr - Lisbin II Yr - Nixon

S.Arock Roy

I.Kingsly Jeba

J.John Christo

M.Muthu Kannan

D.Arun Premkumar

R.PrawynJebakumar

PROFILE OF KARUNYA UNIVERSITY

Karunya University (Declared as Deemed to be University under

Sec. 3 of the UGC Act, 1956 ) is located 25 kms away from Coimbatore

in the very bosom of Mother Nature. The University is surrounded by

an array of green-clad, sky scraping mountains of the Western Ghats.

The Siruvani river with its crystal clear water has its origin here and it

is nature's boon to Coimbatore.

KITS is thus set in a natural environment ideal for a residential

institution. During leisure time, one can feast on the mountains, the

skies with its rainbow colours and the horizon. One with an aesthetic

sense will not miss the waters trickling down the hills, the birds that

sing sweetly on the trees, the cool breeze and the drizzles. One cannot

but wonder at the amazing craftsmanship of God Almighty.

HISTORY

The origin of the Institution is still amazing. In the year 1981,

Dr. D. G. S. Dhinakaran, God's servant received the divine

commission to start a Technical University which could turn out

outstanding engineers with leadership qualities at the national and

global level. Building up such a great Institution was no easy task.

The Dhinakarans had to face innumerable trials and tribulations

including the tragic death of their dear daughter during the course of

this great endeavor. But nothing could stop them from reaching the

goal.

THE VISION

In response to the divine command Dr. D. G. S. Dhinakaran

received from the Lord Almighty, the Institute was established with

the vision of turning out of its portals engineers excelling both in

academics and values. They will be total persons with the right

combination of academic excellence, personality development and

spiritual values.

THE MISSION

To provide the youth with the best opportunities and

environment for higher education and research in Sciences and

Technology and enable them to attain very high levels of academic

excellence as well as scientific, technical and professional

competency.

To train and develop students to be good and able citizens

capable of making significant contribution towards meeting the

developmental needs and priorities of the nation.

To inculcate in students moral values and leadership qualities

to make them appreciate the need for high ethical standards in

personal, social and public life and reach the highest level of

humanism, such that they shall always uphold and promote a high

social order and be ready and willing to work for the emancipation of

the poor, the needy and the under-privileged.

PROFILE OF THE ECE DEPARTMENT

The department of electronics and Communication

Engineering was established in the year 1986.It is very well equipped

with highly commendable facilities and is effectively guided by a set of

devoted and diligent staff members. The department offers both

Under Graduate and Post Graduate programmes (Applied Electronics

and VLSI Design).The department has 39 teaching faculty,6 technical

assistants and an office assistant. It has 527 students in UG

programme and 64 students in PG programme. The department has

been awarded ‘A’ grade by National Board of Accreditation.

THE MISSION

The mission of the department is to raise Engineers and

researchers with technical expertise on par with International

standards, professional attitudes and ethical values with the ability to

apply acquired knowledge to have a productive career and empowered

spiritually to serve humanity.

KEY RESULT AREAS

To undertake research in telemedicine and signal processing

there by opening new avenues for mass funded projects.

To meet the diverse needs of student community and to

contribute to society trough placement oriented training and technical

activities.

Inculcating moral, social, &spiritual values through charity

&outreach programs.

SPECIAL FEATURES The department has fully furnished class rooms with e-learning

facility, Conference hall with video conferencing and latest teaching

aid. The department laboratories are equipped with highly

sophisticates equipments like digital storage oscilloscope, Lattice ISP

expert System, FPGA Training SPARTAN,ADSP 2105,trainer,Antenna

Training system, Transmission Line Trainer and Analyzer, Spectrum

Analyzer(HAMEG),Fiber Optic Transmitting and Receiving Units.

The department laboratories utilize the latest advanced

software like Mentor Graphics, MATLAB Lab view, Tanner tools, FGPA

Advantage 6.3 LS, Microsim 8.0,VI 5416 Debugger.

STRENGTH OF THE DEPARTMENT Research oriented teaching with highly qualified faculty and

experts from the industries

Excellent placement for both UG and PG and students in

various reputed companies like VSNL, HAL, DRDO, BSNL, WIPRO,

SATHYAM, INFOSYS,BELL etc.

Hands on practice for the students in laboratories equipped

with sophisticated equipments and advanced softwares.

Centers of Excellence in signal processing ,medical image

processing, and VLSI faculty and students.

Funded projects from AICTE in VLSI systems and

communication field. Effective research forums to work on current

research areas Industrial training in industry during vacations for all

students Advanced software facilities to design, develop and

implement Electronic s Systems.

SLOGAN OF THE DEPARTMENT MANIFESTO OF FUTURE

CONTENTS

Messages Organizing Committee Advisory Committee Profile of the University Profile of Department of ECE

SESSION A: VERY LARGE SCALE INTEGRATION (VLSI)

SUBSESSION A.1 VL 01. An FPGA-Based Single-Phase Electrical Energy Meter

Binoy B. Nair, P. Supriya Amrita Vishwa Vidhya Peetham, Coimbatore

1

VL 02. A Multilingual, Low Cost FPGA Based Digital Storage Oscilloscope Binoy B Nair, Sreeram, Srikanth, Srivignesh Amrita Vishwa Vidhya Peetham, Coimbatore

7

VL 03. Design Of Asynchronous NULL Convention Logic FPGA R.Suguna, S.Vasanthi M.E., (Ph.D) K.S.Rangasamy College of technology, Tiruchengode

10

VL 04. Development of ASIC Cell Library for RF Applications K.Edet Bijoy, Mr.V.Vaithianathan SSN College of Engineering, Chennai

16

VL 05. A High-Speed Clustering VLSI Processor Based on the Histogram Peak-Climbing Algorithm I.Poornima Thangam, M.Thangavel K.S.Rangasamy College of technology, Tiruchengode

22

VL 06. Reconfigurable CAM- Improving The Effectiveness Of Data Access In ATM Networks C. Sam Alex , B.Dinesh, S. Dinesh kumar JAYA Engineering College,Thiruninravur , Near Avadi, Chennai

28

VL 07. Design of Multistage High Speed Pipelined RISC Architecture Manikandan Raju, Prof.S.Sudha Sona College of Technology, Salem

35

VL 08. Monitoring of An Electronic System Using Embedded Technology N.Sudha , Suresh, R.Norman, SSN College of Engineering, Chennai

39

VL 09. The Design of a Rapid Prototype Platform for ARM Based Embedded Systems A.Antony Judice1 IInd M.E. (Applied Electronics), SSN College of Engineering, Chennai Mr.Suresh R Norman,Asst.Prof., SSN College of Engineering, Chennai

42

VL 10. Implementation of High Throughput and Low Power FIR Filter In FPGA V.Dyana Christilda B.E*, R.Solomon Roach Francis Xavier Engineering College, Tirunelveli

49

VL 11. n x Scalable Stacked MOSFET for Low Voltage CMOS Technologies M.Jeyaprakash, T.Loganayagi Sona College of Technology, Salem

54

VL 12. Test Pattern Selections Algorithms Using Output Deviations S.Malliga Devi,Lyla,B.Das,S.Krishna Kumar NIT, Calicut, Student IEEE member

60

VL 13. Fault Classification Using Back Propagation Neural Network For Digital To Analog Converter B.Mohan*,R. Sundararajan * J.Ramesh** and Dr.K.Gunavathi PSG College of Technology Coimbatore

64

VL 14. Testing Path Delays in LUT based FPGAs R.Usha, Mrs.M.Selvi Francis Xavier Engineering College, Tirunelveli

70

VL 15. VLSI Realisation of SIMPPL Controller SOC for Design Reuse Tressa Mary Baby John Karunya University, Coimbatore

76

VL 16. Clock Period Minimization of Edge Triggered Circuit Anitha.A, D.Jacukline Moni, S.Arumugam Karunya University, Coimbatore

82

VL 17. VLSI Floor Planning Based on Hybrid Particle Swarm Optimization (HPSO) D.Jackuline Moni, Karunya University, Coimbatore Dr.S.Arumugam

Bannariamman educational trust…

87

VL 18. Development Of An EDA Tool For Configuration Management Of FPGA Designs Anju M I , F. Agi Lydia Prizzi, K.T. Oommen Karunya University, Coimbatore

91

VL 19. A BIST for Low Power Dissipation

Rohit Lorenzo, A. Amir Anton Jone Karunya University, Coimbatore

95

VL 20. Test Pattern Generation for Power Reduction using BIST Architecture Anu Merya Philip, Karunya University, Coimbatore

99

VL 21. Test Pattern Generation For Microprocessors Using Satisfiability Format Automatically And Testing It Using Design For Testability Cynthia Hubert Karunya University, Coimbatore

104

VL 22. DFT Techniques for Detecting Resistive Opens in CMOS Latches and Flip-Flops Reeba Rex.S Karunya University, Coimbatore

109

VL 23. 2-D Fractal Array Design for 4-D Ultrasound Imaging Mrs.C Kezi Selva Vijila,Ms Alice John Karunya University ,Coimbatore

113

SESSION B: SIGNAL PROCESSING AND COMMUNICATION(SPC)

SUB SESSION B.1: SPC 01. Secured Digital Image Transmission Over Network Using

Efficient Watermarking Techniques On Proxy Server Jose Anand, M. Biju, U. Arun Kumar JAYA Engineering College, Thiruninravur, Chennai 602024

118

SPC 02. Significance of Digital Signature & Implementation through RSA Algorithm R.Vijaya Arjunan ISTE: LM -51366 Aarupadai Veedu Institute of Technology, Chennai

123

SPC 03. A Survey On Pattern Recognition Algorithms For Face Recognition N.Hema, C.Lakshmi Deepika PSG College of Technology, Coimbatore

128

SPC 04. Performance Analysis Of Impulse Noise Removal Algorithms For Digital Images K.Uma, V.R.Vijaya Kumar PSG College of Technology, Coimbatore

132

SPC 05. Confidentiality in Composition of Clutter Images

G.Ignisha Rajathi, M.E -II Year ,Ms.S.Jeya Francis Xavier Engg College , Tirunelveli

135

SPC 06. VHDL Implementation Of Lifting Based Discrete Wavelet Transform M.Arun Kumar, C.Thiruvenkatesan SSN College of Engineering, Chennai

139

SPC 07. VLSI Design Of Impulse Based Ultra Wideband Receiver For Commercial Applications G.Srinivasa Raja, V.Vaithianathan SSN College of Engineering, Chennai

142

SPC 08. Distributed Algorithms for Energy Efficient Routing in Wireless Sensor Networks T.Jingo M.S.Godwin Premi, K.S.Shaji Sathyabama university, Chennai

147

SUBSESSION B.2:

SPC 09. Decomposition Of EEG Signal Using Source Seperation

Algorithms Kiran Samuel Karunya University, Coimbatore

152

SPC 10. Segmentation of Multispectral Brain MRI Using Source Separation Algorithm Krishnendu K Karunya University, Coimbatore

157

SPC 11. MR Brain Tumor Image Segmentation Using Clustering Algorithm Lincy Annet Abraham,D. Jude Hemanth Karunya University, Coimbatore

162

SPC 12. MRI Image Classification Using Orientation Pyramid and Multiresolution Method R.Catharine Joy, Anita Jones Mary Karunya University, Coimbatore

166

SPC 13. Dimensionality reduction for Retrieving Medical Images Using PCA and GPCA J.W Soumya Karunya University, Coimbatore

170

SPC 14. Efficient Whirlpool Hash Function

J.Piriyadharshini , D.S.Shylu Karunya University, Coimbatore

175

SPC 15. 2-D Fractal Array Design For 4-D Ultrasound Imaging Alice John, Mrs.C Kezi Selva Vijila Karunya University, Coimbatore

181

SPC 16. PC Screen Compression for Real Time Remote Desktop Access Jagannath.D.J,Shanthini Pandiaraj Karunya University, Coimbatore

186

SPC 17. Medical Image Classification using Hopfield Network and Principal Components G.L Priya Karunya University, Coimbatore

191

SPC 18. Delay Minimization Of Sequential Circuits Through Weight Replacement S.Nireekshan kumar,Grace jency Karunya University, Coimbatore

195

SPC 19. Analysis of MAC Protocol for Wireless Sensor Network Jeeba P.Thomas,Mrs.M.Nesasudha, , Karunya University, Coimbatore

200

SPC 20. Improving Security and Efficiency in WSN Using Pattern Codes Anu jyothy,Mrs.M.Nesasudha Karunya University, Coimbatore

204

SESSION C: CONTROL AND COMPUTATION(CC) SUBSESSION C.1: CC 01. Automatic Hybrid Genetic Algorithm Based Printed Circuit

Board Inspection Mridula, Kavitha, Priscilla Adhiyamaan college of Engineering, Hosur-635 109

208

CC 02. Implementation Of Neural Network Algorithm Using VLSI Design B.Vasumathi, Prof.K.R.Valluvan Kongu Engineering College, Perundurai

212

CC 03. A Modified Genetic Algorithm For Evolution Of Neural Network in Designing an Evolutionary Neuro-Hardware N.Mohankumar, B.Bhuvan, M.Nirmala Devi, Dr.S.Arumugam NIT Calicut, Kerala

217

CC 04. Design and FPGA Implementation of Distorted Template Based

Time-of-Arrival Estimator for Local Positioning Application Sanjana T S., Mr. Selva Kumar R, Mr. Cyril Prasanna Raj P VLSI System Design Centre, M S Ramaiah

School of Advanced Studies, Bangalore

221

CC 05. Design and Simulation of Microstrip Patch Antenna for Various Substrate T.Jayanthy, A.S.A.Nisha, Mohemed Ismail, Beulah Jackson Sathyabama university, Chennai.

224

SUBSESSION C.2: CC 06. Motion Estimation Of The Vehicle Detection and Tracking

System

A.Yogesh Karunya University, Coimbatore

229

CC07. Architecture for ICT (10,9,6,2,3,1) Processor Mrs.D.Shylu,Miss.V.C.Tintumol Karunya University, Coimbatore

234

CC08. Row Column Decomposition Algorithm For 2d Discrete Cosine Transform Caroline Priya.M. Karunya University, Coimbatore

240

CC09. VLSI Architecture for Progressive Image Encoder

Resmi E,K.Rahimunnisa Karunya University, Coimbatore

246

CC10. Reed Solomon Encoders and Decoders using Concurrent Error Detection Schemes Rani Deepika.B.J, K.Rahimunnisa Karunya University, Coimbatore

252

CC11. Design of High Speed Architectures for MAP Turbo Decoders Lakshmi .S.Kumar ,Mrs.D.Jackuline Moni Karunya University, Coimbatore

258

CC12. Techonology Mapping Using Ant Colony Optimizaton M.SajanDeepak, Jackuline Moni, Karunya University, Coimbatore S.Arumugam,, Bannariamman educational trust…

264

VCCC‘08

1

Abstract- This paper presents the design and development of a novel FPGA based single phase energy meter which can measure power contained in the harmonics accurately up to the 33rd harmonic. The design presented in this paper has an implementation of the booth multiplication algorithm which provides a very fast means of calculating the instantaneous power consumption. The energy consumed is displayed using seven segment displays and a serial communication interface for transmission of energy consumption to PC is also implemented, the drivers for which are implemented inside the FPGA itself. The readings are displayed on the PC through an interface developed using Visual Basic programming language.

Index Terms— FPGA, Energy meter

I. INTRODUCTION

The main types of electrical energy meters available in the market are the Ferraris meter, also referred to as an induction-type meter and microcontroller based energy meters. However, a Ferraris meter has disadvantages such as creeping, limited voltage and current range, inaccuracies due to non ideal voltage and current waveforms, high wear and tear due to moving parts [1].A wide variety of microcontroller based energy meters are available in the market and offer a significant improvement over an induction type energy meter [2].However, a microcontroller based energy meter has the following disadvantages: 1. Power consumption is large when compared to FPGA based meters. 2. All the resources of the microcontroller may not be made use of, resulting in wastage of resources and money when large scale manufacture is concerned. An FPGA based energy meter not only provides all the advantages offered by the microcontroller based energy meter, but also offers additional advantages such as lower power consumption and lesser space requirements (as very little external circuitry is required). An FPGA based energy meter can also be reconfigured any number of times, that too at a very short notice, thus making it ideal in cases where the user requirements and specifications vary with, time[3].The block diagram of the FPGA based energy meter developed is given in Fig 1.

Fig.1 FPGA based energy meter block diagram

This paper is divided into four sections; section II describes the method used for computing the electrical energy consumed, section III gives the implementation details and the results are presented in section IV.

II. COMPUTATION OF ENERGY CONSUMED

Energy consumed is calculated by integrating the instantaneous power values over the period of consumption of energy.

(1) Vn – instantaneous value of voltage. In instantaneous value of current. T – sampling time. The instantaneously calculated power is accumulated and this accumulated value is compared with a constant that is equal to 0.01kWH and the display updated once this constant is reached. Energy of 0.01kWH = 0.01* 1000* 3600 watt sec. (2) 0.01*1000*3600 w- s = (Vn*In)*T (for n = 0 to N)

In) *(Vn T

sec watt 3600 *1000 *0.01Σ=

∆

(3) When sampling time T = 0.29ms,

∑ ==− 124137931)(10*29.0

3600*1000*01.03 nn IV

The multiplication factor for the potential transformer (PT) is taken to be 382.36 and for current transformer (CT) it is taken as 9.83. The conversion factor of the ADC for converting the 0 – 255 output into the actual scale of 0-5V for voltage and current is 51*51.

An FPGA-Based Single-Phase Electrical Energy Meter Binoy B. Nair, P. Supriya

VCCC‘08

2

Therefore the constant value to be stored for the comparison should be equal to:

85905087 9.83*382.36

51*51*124137931=

Thus, the constant 85905087 has been stored as meter constant. After reaching this value, the energy reading displayed is incremented by 0.01 KWh.

III. IMPLEMENTATION

The FPGA forms the core part of FPGA based energy meter. But in addition to the FPGA, various other hardware components were used to convert the voltage and current inputs to digital form for processing by the FPGA. The energy consumed must be displayed and seven-segment displays were used for the purpose. The consumed energy was transmitted to PC using RS-232 interface, which required additional external circuitry. The hardware details of the FPGA based single phase energy meter is provided in this section. The working of each component too is presented in brief.

A. Sensing unit The function of the sensing unit is to sense the voltage and current through the mains and to convert them into a 0-5 V signal which is then fed into the ADC. The sensing unit is composed of the current transformer, the potential transformer and the adder circuit. The potential transformer is used to step down the mains voltage to a fraction of its actual value, so that it can be safely fed into the adder circuit. The current transformer is used to detect the current flowing through the mains. A burden resistance is used at the secondary side to convert the current into an equivalent voltage signal as current cannot be directly fed to the ADC. Two op-amps in the IC are used as voltage followers and the remaining two are configured as non-inverting amplifiers with a gain of 2 and also act as level shifters, adding a d.c voltage of 2.5 V to the input a.c signal, thus changing the a.c signal range from +2.5 V to -2.5 V to 0V to +5 V, as A/D converter used can only operate from 0V to +5V range [3].

B. Analog to Digital Conversion The basic function of an analog to digital (A/D) converter is to convert an analog input to its binary equivalent. ADC 0808, an 8-bit Successive approximation A/D converter from National Semiconductor is employed for converting the sampled voltage and current signals into equivalent 8-bit binary values [4]. A Sample And Hold (SAH) circuit is needed as the input voltage keeps varying during A/D conversion. If a Sample and Hold circuit is not used, and the input

signal changes during the A/D conversion, the output digital value will be unpredictable. To overcome this, the input voltage is sampled and held constant for the ADC during the conversion. Two LF 398 ICs from National Semiconductor have been used to sample and hold the sampled values of voltage and current during the A/D conversion. The working of a Sample and Hold (SAH) circuit is illustrated in Fig.2.

Fig.2 Working of SAH

The sampling frequency used was 3.45kHz which helps the user to accurately measure the power contained in the harmonics up to the 33rd harmonic. This significantly increases the accuracy of the energy meter and the meter can be used in environments where the presence of harmonics in the supply is significant.

C. Field Programmable Gate Array (FPGA) The FPGA is the key unit of the energy meter presented in this paper. It is programmed to perform the following functions:

1) Find the product of the instantaneous values of

voltage and current to get the instantaneous power.

2) Accumulate the power and compare the accumulated value to the meter constant.

3) When meter constant is exceeded, the energy consumed is incremented by 00.01 and displayed.

4) Drive the seven-segment displays. 5) Send the energy reading to PC via RS-232.

The instantaneous power consumption is calculated using an implementation of the booth multiplier algorithm. Booth multiplier algorithm provides a fast means of multiplying the 8-bit values of voltage and current obtained from the ADC. The resultant value is the instantaneous power which can be of a maximum 17-bit length. These instantaneous power values are accumulated and the accumulated value is compared to the meter constant already stored in the FPGA. Once that meter constant is exceeded, the display is incremented by 00.01 kWh, the accumulator gets reset and the amount by which the accumulator reading exceeded the meter constant is loaded into the accumulator. The meter constant is chosen to correspond to 00.01 kWh primarily due to limitations of the FPGA kit which is used for implementing the energy meter. Now the next set of digital values for

VCCC‘08

3

voltage and current are available at the input and the process of power calculation and accumulation repeats. The FPGA used for implementing the energy meter is Spartan2 from Xilinx. The Hardware Description Language (HDL) used for the purpose is VHDL[5].

D. Seven segment display To display the total energy consumed, four seven-segment displays are used and can be used to display energy from 00.00 to 99.99 KW-hour. Each of the displays needs a base drive signal for enabling it, and the seven segment equivalent of the digit it has to display. The base drive is provided by the FPGA at the rate of 0.25 MHz per display, at the same time it sends the seven segment equivalent of the digit to that display. Hence, all the four displays appear to be displaying the digits simultaneously.

E. Serial Communication Interface RS-232 (Recommended standard-232) is a standard interface approved by the Electronic Industries Association (EIA) for connecting serial devices. Each byte of data is synchronized using it's start bit and stop bit. A parity bit can also be included as a means of error checking. Fig.3 shows the TTL/CMOS serial logic waveform when using the common 8N1 format. 8N1 signifies 8 Data bits, No Parity and 1 Stop Bit. The RS-232 line, when idle is in the Mark State (Logic 1). A transmission starts with a start bit which is (Logic 0). Then each bit is sent down the line, one at a time. The LSB (Least Significant Bit) is sent first. A Stop Bit (Logic 1) is then appended to the signal to make up the transmission. The data sent using this method, is said to be framed. That is the data is framed between a Start and Stop Bit.

Fig.3 TTL/CMOS Serial Logic Waveform

The waveform in Fig. 3 is only relevant for the signal immediately at the output of FPGA. RS-232 logic levels uses +3 to +25 volts to signify a "Space" (Logic 0) and -3 to -25 volts for a "Mark" (logic 1). Any voltage in between these regions (ie between +3 and -3 Volts) is undefined. Therefore this signal is put through a RS-232 Level Converter. The signal present on the RS-232 port of a personal computer is shown in Fig. 4.

Fig.4 RS-232 Logic Waveform

The rate at which bits are transmitted (bits per second) is called baud. Each piece of equipment has its own baud rate requirement. A baud rate of 100 bits per second is used in the design presented. This baud is set both on the PC side as well as on the FPGA

side. The RS-232 Level Converter used is MAX-232 which generates +10V and -10V from a single 5v supply. On the PC side, Microsoft Comm control in Visual Basic is used to read and display the incoming data from the FPGA.

IV. RESULTS

The FPGA based single phase energy meter was designed, simulated and implemented on a Spartan 2 FPGA. The sensing circuit consisting of the op-amps and the sample and hold ICs was implemented on a printed circuit board. The results of simulation and the test results are presented in this section.

A. Simulation Results Simulation results for the adder circuit The aim of the adder circuit, implemented using op-amps LM 324 is to shift the input a.c voltage (maximum allowed is 2.5Vmax) up by 2.5 V, so that the input is in the range 0-5V. This is required as the ADC used is unipolar and can only convert signals in the range 0-5V to their 8-bit binary equivalent. The results of simulating the adder circuit are presented in Fig. 5. The input signal provided was an a.c signal of 2.5Vmax and the output obtained was same as input, but shifted up by 2.5V.

Fig. 5 Adder circuit simulation

Simulation results for the sample and hold The ADC used for converting voltage and current signals into digital form was ADC 0808, which can do A/D conversion on only one channel at a time. Hence the sampled values must be held till inputs on both channels (voltage and current signals are given to separate channels) are converted. An added requirement was that the input signal should not change during the A/D conversion process. Hence it was essential that sample and hold circuits be used. The result of simulating sample and hold circuit is given in Fig.7. The sampling frequency used was 3.45kHz.

VCCC‘08

4

Fig. 7 Sample and hold output with the sampling pulses Simulation results for the VHDL code The VHDL code was simulated using ModelSim. Since it is not possible to present the whole process of energy computation in one figure, the individual operations are presented as separate figures, in the order in which they occur. Fig. 8 shows the multiplication of voltage and current signals taking place. The process of multiplication takes place after every second end of conversion signal. As soon as the end_conv signal, indicating the end of conversion goes high, the test data is read into signal ‘current’ and ‘10000000’ is subtracted from it to yield a signal ‘i’. This is then multiplied with signal ‘v’ obtained in the same manner as described for ‘i’ and the 16-bit product is prepended with 0s to make it 30-bit for addition with the 30-bit signal accumulator . After the end of conversion signal is received, the hold signal indicated by samp_hold, is made low to start sampling again.

Fig.8 Multiplication

The next process after multiplication is accumulation. The process of accumulation is triggered after every third end of conversion signal. The product obtained has to be either added or subtracted from the accumulated value, depending on weather the inputs were of same polarity or of opposite polarity. When both ‘voltage’ and ‘current’ are positive ( i.e. greater than ‘10000000’) or both of them are negative( i.e. less than ‘10000000’) , the product is positive and has to be added to the accumulator. Otherwise, the

product is negative and should be subtracted from the accumulated value. Signals det_v and det_i check for the polarity of the signals and the addition and subtraction processes are triggered by these two signals. The process of accumulation is shown in Fig. 9.

Fig. 9 Accumulation

The process of updating the energy consumed is given in Fig. 10. Once the accumulator value exceeds the meter constant, a signal sumvi_gr_const goes high. This low to high transition triggers another process which increments the energy consumed by one unit, indicating a consumption of 0.01 kWh of energy on the seven-segment display. The total energy consumed is indicated by four signals; last_digit, third_digit, second_digit and first_digit . In Fig. 10, the energy consumed indicated initially is 13.77 kWh which then increments to 13.78 kWh. The RS-232 interface transmits the ASCII equivalent of the four signals through the output ‘bitout’ at a baud rate of 100 bits/s.

Fig.10 Energy Updating

B. Hardware Implementation Results Design overview

The design overview was generated using Xilinx ISE. It gives an overview of the resources utilized on the FPGA board on which the design is implemented. The details such as the number of slice registers used as flip-flops, latches etc. can be found from the design

VCCC‘08

5

overview. Fig.11 presents the design overview for the energy meter code implemented on the FPGA.

Fig.11 Design overview

RTL schematic The RTL schematic presents the design of the implemented energy meter code after synthesis. The top level schematic, showing only the inputs and the outputs obtained after synthesis using Leonardo Spectrum is presented in Fig.12.

Fig.12 top-level RTL schematic

Pin locking The I/O pins for the FPGA were configured using Xilinx ISE. A total of twenty pins were configured as outputs, including those for control of ADC, Base drive for seven-segment display, the data for seven-segment display and the serial communication. A pin was configured exclusively to give sample/hold signal to the LF 398 Sample and Hold IC. Eleven pins were configured as inputs, including eight pins to detect the ADC output, reset signal and the clock signal for the FPGA. Fig.13 shows the pin locking and the location of the pins on the board.

Fig.13 Pin Locking

Serial communication interface The GUI for serial communication on the PC was developed using Visual Basic. The data sent by the FPGA was received and displayed on the PC. The tariff chosen was 1 rupee per unit, but can be changed by modifying the Visual Basic program. The GUI form is shown in Fig.14.

Fig.14 GUI from for serial communication

The experimental setup used to implement and test the design is shown in Fig.15.

VCCC‘08

6

Fig.15 Experimental setup

REFERENCES

[1] Kerry Mcgraw, Donavan Mumm, Matt Roode, Shawn Yockey, The theories and modeling of the kilowatt-hour meter, [online] Available: http://ocw.mit.edu. [2] Anthony Collins, Solid state solutions for electricity metrology, [Online] Available: http:// www.analog.com. [3] Ron Manicini, Op-amps for everyone- Design Reference, Texas Instruments, Aug. 2002 [4] Nicholas Gray, ABCs of ADCs: Analog-to-Digital Converter Basics, National Semiconductor Corporation, Nov. 2003. [5] Douglas L.Perry, VHDL: Programming by example, TMH Publishing Ltd., New Delhi ;4th Ed., 2003. [6] T.Riesgo,Y. Torroja, and E.Torre, “Design Methodologies Based on Hardware Description Languages”, IEEE Transactions on Industrial Electronics, vol. 46, No. 1, pp. 3-12, Feb.1999.

VCCC‘08

7

A Multilingual, Low Cost FPGA Based Digital Storage Oscilloscope Binoy B.Nair, L.Sreeram, S.Srikanth, S.Srivignesh

Amrita Vishwa Vidhyapeetham, Ettimadai, Coimbatore – 641105, Tamil Nadu, India.

Email addresses: [binoybnair, lsree87, srikanth1986, s.srivignesh]@gmail.com

Abstract--In a country like India, a Digital Storage Oscilloscope is too costly an instrument for most of the schools to use, as a teaching aid. Another problem associated with commercially available Digital Storage Oscilloscopes is that the user interface is usually in English, which is not the medium of instruction in most of the schools in rural areas. In this paper, the design and implementation of an FPGA based Digital Storage Oscilloscope is presented, which overcomes the above difficulties. The oscilloscope not only costs a fraction of the commercially available Digital Storage Oscilloscopes, but has an extremely simple user interface based on regional Indian languages. The oscilloscope developed, is based on a Cyclone II FPGA. Analog to Digital Converter interface developed, allows usage of ADCs depending on consumers choice, allowing customization of the oscilloscope. The VGA interface developed allows any VGA monitor to be used as display.

Keywords - Digital Storage Oscilloscope, FPGA, VGA.

1 .INTRODUCTION

Oscilloscopes available today cost thousands of rupees.[11] Moreover these oscilloscopes have less functionalities, smaller displays and limited number of channels.[1],[10] The major disadvantage of PC based oscilloscopes is that they are PC based. So it is not portable and another major disadvantage is it requires specialized software packages to be installed in the PC.[2] These packages are usually expensive and may not produce optimum performance in low end PCs. Additional hardware like Data Acquisition cards are also required.[3] But with this design, there are no such problems as the PC is replaced with FPGA and instead of data acquisition card, a low cost op-amp based circuit is used for input signal conditioning and commercially available A/D converter is used for digitizing the signals. This results in significant cost reduction with little difference in performance. The FPGA, A/D converter and signal conditioning circuit together form a single system, which is portable and any VGA monitor can be used for display.

Functions like Fast Fourier Transform, convolution, integration, differentiation and mathematical operations like addition, subtraction, and multiplication of signals are implemented.[5],[6] Since we are interfacing the oscilloscope with VGA monitor, we have a larger display. New functions can be easily added just by changing the VHDL code. The number of input channels available is

also not restricted as it depends on the A/D converter one uses. Here we use an 8 bit A/D converter with 8 input channels and hence it is possible to view up to 8 input waveforms on the screen simultaneously. The different waveforms can be viewed in different colors thereby reducing confusion.

II. SYSTEM DEVELOPMENT

The design presented can be considered to be made up of three modules, for ease of understanding. The analog signal conditioning, analog to digital conversion and its interface to FPGA comprise the first module. The second module deals with the processing of the acquired digital signal and the third module deals with presenting output in user understandable form on a VGA screen. The whole system is to be implemented using VHDL.[8] The flow of the process is shown in the Fig.1.

Fig 1. Flowchart of FPGA based DSO

VCCC‘08

8

A. Input Signal Conditioning The input signal is scaled using op-amp based scaling circuits to bring it to the voltage range accepted by the A/D converter. After scaling this signal is fed to the A/D converter which converts the signal to its digital equivalent.[2] The control signals to A/D converter are provided from the FPGA itself. The circuit designed draws very little power thus minimizing loading effects. An additional advantage of having the A/D converter as a separate hardware unit is that any commercially available A/D converter can be used depending on user requirement with little or no change in the interface code. B. Signal Processing

The digital values obtained after A/D conversion are stored in a 640 X 8 bit RAM created inside FPGA. The control signals are sent to the memory for allowing data to be written to it. The clock is fed to a counter that generates the memory's sequential addresses. An option to store the captured wave information is also provided through a flash memory interface so that the information can be stored for future reference. It also provides the user with the ability to log the waveform data for a duration, limited only by the size of the flash memory used. Additional functions like Integration, Differentiation, Fast Fourier Transform and mathematical operations like addition, subtraction and multiplication of signals are also implemented. Integration is done using Rectangular rule method. Differentiation is done using finite differentiation method.[9] Fast Fourier Transform is implemented using cordic algorithm. Addition, Subtraction and Multiplication is done with basic operations like Ripple adder, 2’s complement addition and Booth algorithm respectively.[7] VGA Interface & Display A VGA monitor interface was also developed and the waveforms are displayed on the screen with 640X480 resolution with all the necessary signals such as horizontal synchronization and vertical synchronization along with RGB color information sent to the VGA monitor, being generated by the FPGA.[4] The timing diagram of VGA signals is shown in the Fig. 2.

Fig 2. Timing Diagram of Analog to Digital Convertor The VGA display is divided into two parts. Upper part displays the wave and the lower part displays the menu and the wave parameters. One of the distinguishing features of the oscilloscope presented here is its ability to display the menu and the wave information in Indian languages like Tamil, Telugu, Malayalam or Hindi in addition to English. Each of the character is generated by considering a grid of 8X8 pixels for Indian languages and 8X6 pixels for English

and numbers. A sample grid for displaying letter ‘A’ on the screen is given in Fig.3.[4]

Fig 3. Matrix values of letter “A”

III. LABORATORY TESTS

A sample result of the laboratory tests is shown in Fig.4. In this sample test we will be displaying a sine wave of 2.5 kHz in second channel. The interface options are displayed in Tamil language. Additionally maximum and minimum values of both the waves are also displayed. All the waveforms are in different colors. So that it is easy to differentiate between the waveforms. The waveforms are in the same color as that of the options available to avoid confusion between waves.

Fig.4. Sample Laboratory Test Output

IV. CONCLUSIONS

FPGA based Digital Storage Oscilloscope presented here has many advantages like low-cost, portability, availability of channels, 4096 colors to differentiate waveforms and a large display which helps in analyzing the waveforms clearly with multiple regional language interactive interface. The user specifications of the developed system have been set up in accordance with the requirements of common high school level science laboratories. The interface circuit hardware has been developed with few affordable electronic components for conversion and processing of the analog signal in the digital form before being acquired by the FPGA. The overall cost

VCCC‘08

9

of the Digital Storage Oscilloscope presented here is approximately USD 184. The system was successfully tested with several forms of input waveforms, such as sinusoidal, square, and triangular signals. The developed system has possible expansion capacities in the form of additional signal processing modules. This DSO can be used for real-time data acquisition in most common-purpose low-power- and low-frequency-range applications for high school laboratories. It can also be used as an instructional tool for undergraduate data acquisition courses for illustrating complex concepts concerning parallel port programming, A/D conversion, and detailed circuit development. The entire system is shown in Fig.5.

Fig.5. Entire System of FPGA based DSO

REFRENCES

[1] J. Miguel Dias Pereira, ”The History and Technology of Oscilloscopes,”IEEE Instrumentation & Measurement Magazine

[2] Chandan Bhunia, Saikat Giri, Samrat Kar, Sudarshan Haldar, and Prithwiraj Purkait, “A low cost PC based Oscilloscope,” IEEE Transactions on Education, Vol. 47, No. 2, May 2004

[3] R.Lincke, I.Bijii, Ath.Trutia , V.Bogatu, B.Logofzitu, “PC based Oscilloscopes”

[4] A.N. Netravali P. Pirsch, “Character Display on CRT,”IEEE Transactions on Broadcasting, Vol. Bc-29, No. 3, September 1983

[5] IEEE Standard specification of general-purpose laboratory Cathode-Ray Oscilloscopes, IEEE Transactions On Instrumentation And Measurement, Vol. Im-19, No. 3, August 1970

[6] Oscilloscope Primer, XYZs of Oscilloscopes. [7] Morris Mano,” Digital Design”. [8] Douglas Perry,”VHDL Programming by Example.” [9] W.Cheney, D.Kincaid,”Numeric methods and

Computing.” [10] Product Documentation of Tektronix and Aplab [11] www.tektronix.com

VCCC‘08

10

Design of Asynchronous NULL Convention Logic FPGA R.Suguna , S.Vasanthi M.E., (Ph.D).,

II M.E (VLSI Design)Senior Lecturer, ECE Department K.S.Rangasamy College of technology, Tiruchengode.

Abstract— NULL Convention logic (NCL) is a self-timed circuit in which the control is inherent in each datum. There are 27 fundamental NCL gates. The author proposes a logic element in order to configure as any one of the NCL gate. Two versions of reconfigurable logic element are developed for implementing asynchronous FPGA. One with embedded registration logic and the other without embedded registration logic. Both versions can be configured as any one of the 27 fundamental NULL convention logic (NCL) gate. It includes resettable and inverting variations. Both can utilize embedded registration for gates with three or fewer inputs. The version with only extra embedded registration can utilize gates with four inputs. The above two approaches are compared with existing approach showing that both version developed herein yield more area efficient NULL convention logic (NCL) circuit implementation. Index Terms—Asynchronous logic design, delay-insensitive circuits, field-programmable gate array (FPGA), NULL convention logic (NCL), reconfigurable logic.

I.INTRODUCTION

Though synchronous circuit design presently dominates the semiconductor design industry, there are major limiting factors to this design approach, including clock distribution, increasing clock rates, decreasing feature size, and excessive power consumption[6]. As a result of the problems encountered with synchronous circuit design, asynchronous design techniques have received more attention. One such asynchronous approach is NULL Convention logic (NCL). NCL is a clock-free delay-insensitive logic design methodology for digital systems. The separation between data and control representations provides self-synchronization, without the use of a clock signal. NCL is a self-timed logic paradigm in which control is inherent in each datum. NCL follows the so-called weak conditions of Seitz’s delay-insensitive signaling scheme.

II. NCL OVERVIEW NCL uses threshold gates as its basic logic elements [4]. The primary type of threshold gate, shown in Fig. 1, is the THmn gate, where 1<m<n. THmn gates have single-wire inputs, where at least ‘m’ of the ‘n’ inputs must be asserted before the single wire output

will become asserted. In a THmn gate, each of the ‘n’ inputs is connected to the rounded portion of the gate. The output emanates from the pointed end of the gate and the gate’s threshold value ‘m’ is written inside the gate. NCL circuits are designed using a threshold gate network for each output rail [3] (i.e., two threshold gate networks would be required for a dual-rail signal D, one for D0 , and another for D1 ). Another type of threshold gate is referred to as a weighted threshold gate, denoted as THmnWw1w2….. wR. Weighted threshold gates have an integer value m>wR>1 applied to inputR. Here1<R<n, where n is the number of inputs, m is the gate’s threshold and w1,w2,….. , wR, each>1, are the integer

Fig. 1. THmn threshold gate.

Fig. 2. TH34w2 threshold gate: Z = AB + AC + AD + BCD. weights of input1, input2, inputR, respectively. For example, consider the TH34W2 gate shown in Fig. 2, whose n=4 inputs are labeled A, B, C and D . The weight of input A,W(A), is therefore 2. Since the gate’s threshold is 3, this implies that in order for the output to be asserted, either inputs B, C and D, must all be asserted, or input A must be asserted along with any other input B, C or D. NCL threshold gates are designed with hysteresis state holding capability, such that all asserted inputs must be deasserted before the output is deasserted. Hysteresis ensures a complete transition of inputs back to NULL before asserting the output associated with the next wavefront of input data. NCL threshold gate variations include resetting THnn and inverting TH1n gates. Circuit diagrams designate resettable gates by either ‘d’ or ‘n’ appearing inside the gate, along with the gate’s threshold. ‘d’ denotes the gate as being reset to logic ‘1’ and ‘n’ to logic ‘0’. Both resettable and inverting gates are used in the design of delay insensitive registers [8].

VCCC‘08

11

DELAY INSENSITIVITY NCL uses symbolic completeness of expression to achieve delay insensitive behavior [7]. A symbolically complete expression depends only on the relationships of the symbols present in the expression without reference to their time of evaluation [8]. In particular, dual-rail and quad-rail signals, or other mutually exclusive assertion groups (MEAGs) [3] can incorporate data and control information into one mixed-signal path to eliminate time reference. A dual-rail signal D consists of two mutually exclusive wires and which may assume any value from the set. Likewise, a quad-rail signal consists of four mutually exclusive wires that represent two bits. For NCL and other circuits to be delay insensitive, they must meet the input completeness and observability criteria [2].

A. Input completeness In order for NCL combinational circuits to maintain delay-insensitivity, they must adhere to the completeness of input criterion [5], which requires that: 1. All the outputs of a combinational circuit may not transition from NULL to DATA until all inputs have transitioned from NULL to DATA, and 2. All the outputs of a combinational circuit may not transition from DATA to NULL until all inputs have transitioned from DATA to NULL.

Table I 27 NCL fundamental gates

B. Observability This observability condition, also referred to as indicatability or stability, ensures that every gate transition is observable at the output, which means that every gate that transitions is necessary to transition at least one of the outputs[5].

III. DESIGN OF A RECONFIGURABLE NCL LOGIC ELEMENT

Fig. 4. shows a hardware realization [1] of a reconfigurable NCL LE, consisting of reconfigurable logic, reset logic, and output inversion logic. There are 16 inputs used specifically for programming the gate: Rv, Inv, and Dp (14:1). Five inputs are only used during gate operation: A, B, C, D and rst. P is used to select between programming and operational mode. Z is the gate output; Rv is the value that Z will be reset, when rst is asserted during operational mode .Inv determines if the gate output is inverted or not. During programming mode, Dp(14:1) is used to program the LUT’s 14 latches in order to configure the LE as a specific NCL gate. Addresses 15 and 0 are constant values and therefore do not need to be programmed. A. Reconfigurable logic The reconfigurable logic portion consists of a 16-address LUT [1], shown in Fig. 3, and a pull-up/pull-down (PUPD) function. The LUT contains 14 latches, shown in Fig. 4, and a pass transistor multiplexer (MUX). When P is asserted (nP is deasserted), the Dp values are stored in their respective latch to configure the LUT output to one of the 27 equations in Table I. Thus, only 14 latches are required because address 0 is always logic ‘0’ and address 15 is always logic ‘1’ according to the 27 NCL gate equations. The gate inputs A, B, C and D are connected to the MUX select signals to pass the selected latch output to the LUT output. The MUX consists of N-type transistors and a CMOS inverter to provide a full voltage swing at the output.

VCCC‘08

12

Fig. 3. Reconfigurable NCL LE without extra embedded registration.

The LUT output is then connected to the N-type transistor of the PUPD function, such that the output of this function will be logic ‘0’ only when F is logic 1. Since all gate inputs (i.e., A, B, C and D ) are connected to a series of P-type transistors, the PUPD function output[4] will be logic ‘1’ only when all gate inputs are logic ‘0.’

B. Reset Logic The reset logic consists of a programmable latch and transmission gate MUX [1]. During the programming phase when P is asserted (nP is deasserted), the latch stores the value Rv. The gate will be reset when rst is asserted. rst is the MUX select input, such that when it is logic ‘0’, the output of the PUPD function passes through the MUX to be inverted and output on Z. When rst is logic 1, the inverse of Rv is passed through the MUX. C. Output Inversion Logic The output inversion logic also consists of a programmable latch and transmission gate MUX. The programmable latch stores Inv during the programming phase, which determines if the gate is inverting or not. The input and output of the reconfigurable logic are both fed as data inputs to the MUX, so that either the inverted or noninverted value can be output, which is used as the MUX select input.

VCCC‘08

13

Fig. 4 16 bit LUT

IV. ALTERNATIVE RECONFIGURABLE NCL LOGIC ELEMENT WITH EXTRA EMBEDDED REGISTRATION CAPABILITY: An alternative to the reconfigurable NCL LE described is shown in Fig. 5. This design is very similar to the previous version. However, it contains an additional latch and input ER for selecting embedded registration. Additional embedded registration logic within the reconfigurable logic’s PUPD logic, along with an additional registration request input, Ki is used. The remaining portions of the design, reconfigurable logic, reset logic and output inversion logic functions the same. A. Reconfigurable Logic The reconfigurable logic portion consists of the same 16-address LUT used in the previous version and a revised PUPD function that includes additional embedded registration logic. When embedded registration is disabled (i.e., ER=logic ‘0’ during the programming phase), Ki should be connected to logic ‘0’, and the PUPD logic functions the same as explained. However, when embedded registration is enabled, the output of the PUPD function will only be logic ‘0’ when both F and Ki are logic ‘1’, and will only be logic ‘1’ when all gate inputs (i.e., A,B,C and D) and Ki are logic ‘0’. B. Embedded Registration Embedded registration[1] merges delay insensitive registers into the combinational logic, when possible. This increases circuit performance and substantially decreases the FPGA area required to implement most designs, especially high throughput circuits (i.e., circuits containing many registers). Fig.6. shows an example of embedded registration applied to an NCL full-adder, where (a) shows the original design consisting of a full-adder and 2-bit NCL register [2], [8], (b) shows the design utilizing embedded registration when implemented using the reconfigurable NCL LE without extra embedded registration capability, and (c) shows the design utilizing embedded registration when implemented using the reconfigurable NCL LE with extra embedded registration capability.

VCCC‘08

14

Fig. 5. Reconfigurable NCL LE with extra embedded registration.

Fig. 6. Embedded registration example. Original

design.

Implementation using NCL reconfigurable LE in Fig. 3.

VCCC‘08

15

Implementation using NCL reconfigurable LE in Fig. 3.

V. CONCLUSION

RECONFIGURABLE LOGIC ELEMENT COMPARISON: Table II compares the and propagation delays for the two reconfigurable LE’s developed herein, based on which input transition caused the output to transition, and shows the average propagation delay,TP , during normal operation (i.e., excluding reset). Comparing the two reconfigurable LE’s developed herein shows that the version without extra embedded registration is 6% smaller and 20% faster. However, since fewer gates may be required when using the version with extra embedded registration, the extra embedded registration version may produce a smaller, faster circuit, depending on the amount of additional embedded registration that can be utilized.

TABLE II Propagation delay comparison based on input

transition

REFERENCES: 1] Scott C.Smith, “Design of an FPGA Logic Element for Implementing Asynchronous NULL Convention Logic circuits,”IEEE Trans.on VLSI, Vol. 15, No. 6, June 2007. [2]S.C.Smith,R.F.DeMara,J.S.Yuan,D.Feruguson,and D.Lamb, “Optimization of NULL Convention Self Timed Circuits,” Integr.,VLSI J.,vol. 37,no. 3,pp.135-165,2004. [3] S. C. Smith, R. F. DeMara, J. S. Yuan, M. Hagedorn, and D. Ferguson, “Delay-insensitive gatelevel pipelining,” Integr., VLSI J., vol. 30, no.2, pp. 103–131, 2001. [4]G.E.Sobelman & K.M. Fant,“CMOS circuit design of threshold gates with hysteresis,”in Proc. IEEE Int. Symp. Circuits Syst. (II), 1998, pp. 61–65. [5]Schott. A. Brandt and K. M. Fant, “Considerations of Completeness in the Expression of Combinational Processes” Theseus Research, Inc. 2524 Fairbrook Drive Mountain View, CA 94040 [6] J.McCardle and D. Chester, “Measuring an asynchronous Processor’s power and noise,” in Proc. Synopsys User Group Conf. (SNUG), 2001, pp. 66–70. [7] A. Kondratyev, L. Neukom, O. Roig, A. Taubin and K. Fant, “Checking delay-insensitivity: 104 gates and beyond,” in Proc. 8th Int. Symp. Asynchronous Circuits Syst., 2002, pp. 137–145. [8] K. M. Fant and S. A. Brandt, “NULL convention logic: A complete and consistent logic for asynchronous digital circuit synthesis,” in Proc. Int. Conf. Appl. Specific Syst., Arch., Process., 96, pp. 261–273.

VCCC‘08

16

Abstract— The great interest in RF CMOS comes from the obvious advantages of CMOS technology in terms of production cost, high-level integration, and the ability to combine digital, analog and RF circuits on the same chip. This paper reviews the development of an ASIC cell library especially for RF applications. The developed cell library includes cells like filters, oscillators, impedance matching circuits, low noise amplifiers, mixers, modulators and power amplifiers. All cells were developed using standard 0.25µm and 0.18µm CMOS technology. Circuit design criteria and measurement results are presented. Applications are low power, high speed data transfer RF applications.

Index Terms— ASIC Cell Library, RF VLSI

I.INTRODUCTION

he use of analog CMOS circuits at high frequencies has garnered much attention in the last several years.

CMOS is especially attractive for many of the applications because it allows integration of both analog and digital functionality on the same die, increasing performance at the same time as keeping system sizes modest. The engineering literature has shown a marked increase in the number of papers published on the use of CMOS in high frequency applications, especially since 1997. These applications cover such diverse areas as GPS, micro power circuits, GSM, and other wireless applications at frequencies from as low as 100MHz for low earth orbiting satellite system to 1000 MHz and beyond. Many of the circuits designed are of high performance and have been designed with and optimized for the particular application in mind.

At the heart of rapid integrated system design is the use of cell libraries for various system functions. In digital design, these standard cells are both at the logic primitive level (NAND and NOR gates, for example) as well as higher levels of circuit functionality (ALUs, memory). For baseband analog systems, standard cell libraries are less frequently used, but libraries of operational amplifiers and other analog circuits are available. In the design of a CMOS RF cell library, the cells must be designed to be

flexible in terms of drive requirements, bandwidth and circuit loading. For RF applications, the most common drive requirements for off-chip loads are based on 50 impedances. A factor governing the bandwidth of the RF cells is the nodal capacitance to ground, primarily the drain and source sidewall capacitances. Transistors making up the library elements are usually designed with multiple gate fingers to reduce the sidewall capacitance. Since these cells are to be used with digital and baseband analog systems, control by on-chip digital and analog signals is another factor in the design.

The choice of cells in such a cell library should be based on the generalized circuit layout of a wireless system front end. A typical RF front end will have both a receiver and transmitter connected to an antenna through some type of control device. For the receiver chain, the RF signal is switched to the upper arm and enters the low noise amplifier and then to a down converting mixer. For the transmit chain, the RF signal enters an upconverting mixer and is then sent to the output amplifier and through the control device to the antenna. A number of CMOS cells should be designed for the library. These cells include an RF switch for control of microwave and RF energy flow from the antenna to the transmitter or receiver, a transmitter output amplifier capable of driving a 50 antenna directly and at low distortion, and a mixer that may be used in either circuit branch. An active load is included for use wherever a load may be required.

The cell library for RF applications presented here attempts to address many of the design factors. The library consists of cells designed using 0.18 and 0.25 CMOS processes. The cells described in this paper can be used separately or combined to construct more complex functions such as an RF application. Each of the cells will be discussed separately for the sake of clarity of presentation and understanding of the operation of the circuit. There was no post-processing performed on any of the circuit topologies presented in this paper. The systems were designed to maintain 50 system compatibility. The cells have been designed for flexibility in arrangement to meet the designer's specific application. The larger geometry devices may also be used for education purposes since there are a number of low cost fabrication options available for the technology. In the design of any cell library, a trade-off between

Development of ASIC Cell Library for RF Applications.

K.Edet Bijoy1 Mr.V.Vaithianathan2 1K.Edet Bijoy, IInd M.E. (Applied Electronics), SSN College of Engineering, Kalavakkam,

Chennai-603110. Email: [email protected] 2Mr.V.Vaithianathan, Asst.Prof., SSN College of Engineering, Kalavakkam, Chennai-603110.

T

VCCC‘08

17

speed/frequency response and circuit complexity is always encountered. A portion of this work is to show the feasibility of the cell library approach in RF design. The approach taken in this work with the technologies listed above is directly applicable to small device geometries. These geometries will yield even better circuit performance than the cells discussed here.

II.LOW NOISE AMPLIFIER DESIGN

The most critical point for the realization of a highly integrated receiver is the RF input. The first stage of a receiver is a low noise amplifier (LNA), which dominates the noise figure of the whole receiver. Besides of low noise, low power consumption, high linearity and small chip size are the other key requirements. Because of this situation the design of the LNA is really a challenge.

Fig.1. Amplifiers with input matching circuits: (a) inductor Lg connected directly to the transistor, (b) pad capacitance Cpad connected directly to the transistor.

Among a few possible solutions for the LNA core,

a cascode amplifier shown in Fig (1) with inductive degeneration is often preferred. The transistor in common-gate (CG) configuration of the cascode amplifier reduces the Miller effect. It is well known, that the capacitance connected between output and input of an amplifier with inverting gain, is seen at its input and output multiplied by the gain. The gain of the common-source (CS) configuration is gmRL where RL is the output impedance, and the input impedance of CG configuration is 1/gm. Therefore, if both transistors have similar gm the gain of the transistor in CS configuration decreases and the Miller capacitance is reduced. At the output of the cascode amplifier, the overlap capacitance does no affect the Miller effect since the gate of the amplifier is grounded. Thus, the tuned capacitor of the LC tank only has to be large enough to make the tank insensitive to Cgd2. In addition, with a low impedance point at the output of the common source amplifier, the instability caused by the zero of the transfer function is highly reduced. Finally, with an AC ground at the gate of the cascode amplifier, the output is decoupled from the input, giving the cascode configuration a high reverse isolation. Although in

Fig.(1) the LC tank is shown explicitly, in practical situations another configuration can be made, while for small signal circuits, it does not matter if the second node of the capacitor C is connected to Vdd or ground. However, in any case a serial output capacitor is needed to block the DC path. This capacitor, not shown in Fig. (1) can contribute to the output matching, so it has to be chosen very carefully. The output pad capacitance can be used for output matching additionally.

In order to connect the LNA to a measurement equipment, a package or an antenna bonding pads (Cpad) are needed. Fig. (1) shows two LNAs with different input matching networks. In the networks from Fig. (1) all components are placed on the chip. This principle is very often used, therefore we start the LNA analysis from this point. The bonding pad is parallel to the input of the LNA, and as long as their impedance is much higher than the input impedance of the LNA, they do not introduce any significant effects to the input impedance of the whole circuit. In our case assuming practical value of 150 fF for Cpad and frequency of 2 GHz the impedance of the pad can be neglected in comparison with required 50. However, if the influence of Cpad can not be neglected only the imaginary part of Zin is affected.

The use of inductive degeneration results in no additional noise generation since the real part of the input impedance does not correspond to a physical resistor. The source inductor Ls generates a resistive term in the input impedance

Zin = (gmLs/Cgs) + j ( (2(Lg+Ls)Cgs -1)/ Cgs) where Ls and Lg are source and gate inductors, respectively and gm and Cgs denote small signal parameters of transistor M1 (Cgd, gds and Cpad are neglected).

The inductor Lg series connected with the gate cancels out the admittance due to the gate-source capacitor. Here, it is assumed that the tuned load (L, C) is in resonance at angular frequency 0 and therefore appears to be a pure resistive load RL. To obtain a pure resistive term at the input, the capacitive part of input impedance introduced by the capacitance Cgs should be compensated by inductances. To achieve this cancellation and input matching, the source and gate inductances should be set to

Ls = RsCgs/ gm

Lg = (1-02LsCgs) / 0

2Cgs

where Rs is the required input resistance, normally 50.

The noise figure of the whole amplifier with noise contribution of transistor M2 neglected can be given as

VCCC‘08

18

F = 1 + (/)(1/Q) (0/ T) [ 1+ (2/k )(1+Q2) + 2|c|( 2 / k ) ] Where = gm/gd0

, , c, k are bias dependent transistor parameters and Q=1/(0CgsRs) is the quality factor of the input circuit. It can be seen that noise figure is improved by the factor (T /0)

2. Note, that for currently used sub-micron MOS-technologies T is in the order of 100 GHz. The noise figure of the LNA can be also expressed in simplified form with induced gate noise neglected, however, easier for first order analysis

F 1 + kgmRs/(T/ 0)2

where is bias dependent constant and Rs is source resistance. Although on a first sight suggests low transconductance gm for low noise figure, taking into account that T gm/Cgs one can see that it is not true. Increasing of gm lowers the noise figure but at the cost of higher power consumption. Since Cgs contributes to the (T /0)

2 factor, lowering this capacitance leads to improved noise. The last possibility of noise reduction is reducing the signal source resistance Rs. However, this resistance is fixed, normally.

Decreasing the Cgs capacitance is done by reducing the size of the transistor. This has also impact on the linearity of the amplifier, and according to input matching requirements, very large inductors Lg should be used that can not be longer placed on chip. Because of this reason the inductor Lg is placed off-chip. Between the inductor and the amplifier the on chip pad capacitance Cpad is located as it is shown in Fig. (1)b. It consists of the pad structure itself and the on chip capacitance of ESD structure and signal wiring. In this case pad capacitance and Cgs are in similar order. Therefore, the pad has to be treated as a part of an amplifier and then taken into account in the design process. It should be noted, that particularly input pads need special consideration. It has been proven that shielded pads have ideally no resistive component, and so they neither consume signal power nor generate noise. They consist of two metal plates drawn on the top and bottom metals to reduce the pad capacitance value down to 50 fF. Unfortunately, it is not the whole capacitance, which should be taken into account. One has to realize that all connections to the pad increase this value. The input matching circuit is very important for low noise performance of the LNA. Low noise cascode amplifiers using different approaches for input impedance matching have been analyzed and compared in terms of noise figure performance for bipolar technology. The effect of noise filtering caused by the matching network has been pointed out. Furthermore, a parallel-series matching network has

been proposed, which allows dominant noise contributions to be reduced. A very low noise figure can be achieved by this way. This matching consists of series inductance and parallel capacitance connected between base and emitter of the common source transistor. The input matching presented in Fig. (1)b is quite similar to bipolar amplifier. Here, instead of base emitter capacitance pad capacitance is used. It can be expected, that taking pad capacitance as a part of input matching can lower the noise figure of a FET LNA. RF-CMOS LNAs have achieved lowest noise values if pad capacitance was taken into consideration. The reason for this behavior has not been discussed enough, so far.

OSCILLATOR DESIGN Oscillators can generally be categorised as either

amplifiers with positive feedback satisfying the wellknown Barkhausen Criteria, or as negative resistance circuits. At RF and Microwave frequencies the negative resistance design technique is generally favoured.

The procedure is to design an active negative resistance circuit which, under large-signal steady-state conditions, exactly cancels out the load and any other positive resistance in the closed loop circuit. This leaves the equivalent circuit represented by a single L and C in either parallel or series configuration. At a frequency the reactances will be equal and opposite, and this resonant frequency is given by the standard formula

f= 1/ (2 (LC)) It can be shown that in the presence of excess negative resistance in the small-signal state, any small perturbation caused, for example, by noise will rapidly build up into a large signal steady-state resonance given by equation Negative resistors are easily designed by taking a three terminal active device and applying the correct amount of feedback to a common port, such that the magnitude of the input reflection coefficient becomes greater than one. This implies that the real part of the input impedance is negative. The input of the 2-port negative resistance circuit can now simply be terminated in the opposite sign reactance to complete the oscillator circuit. Alternatively high-Q series or parallel resonator circuits can be used to generate higher quality and therefore lower phase noise oscillators. Over the years several RF oscillator configurations have become standard. The Colpitts, Hartly and Clapp circuits are examples of negative resistance oscillators shown here using bipolars as the active devices. The Pierce circuit is an op-amp with positive feedback, and is widely utilised in the crystal oscillator industry. The oscillator design here concentrates on a worked example of a Clapp oscillator, using a varactor tuned ceramic coaxial resonator for voltage control of the output frequency. The frequency under

VCCC‘08

19

consideration will be around 1.4 GHz, which is purposely set in-between the two important GSM mobile phone frequencies. It has been used at Plextek in Satellite Digital Audio Broadcasting circuits and in telemetry links for Formula One racing cars. At these frequencies it is vital to include all stray and parasitic elements early on in the simulation. For example, any coupling capacitances or mutual inductances affect the equivalent L and C values in equation, and therefore the final oscillation frequency. Likewise, any extra parasitic resistance means that more negative resistance needs to be generated. (A) Small-Signal Design Techniques The small-signal schematic diagram of the oscillator under consideration is illustrated in Figure(2). The circuit uses an Infineon BFR181W Silicon Bipolar as the active device, set to a bias point of 2V Vce and 15 mA collector current. The resonator is a 2 GHz quarter wavelength short circuit ceramic coaxial resonator, available from EPCOS. The resonator is represented by a parallel LCR model and the Q is of the order of 350. It is important to note that for a 1.4 GHz oscillator a ceramic resonator some 15 – 40 % higher in nominal resonant frequency is required. This is because the parallel resonance will be pulled down in frequency by the necessary coupling capacitors (4 pF used) and tuning varactor etc. The varactor is a typical Silicon SOD-323 packaged device, represented by a series LCR model, where the C is voltage dependent. The load into which the circuit oscillates is 50 . At these frequencies any necessary passive components must include all stray and parasitic elements. The transmission lines represent the bonding pads on a given substrate. The transmission lines have been omitted for the sake of clarity. The oscillator running into its necessary load forms a closed-loop circuit and cannot be simulated in this form because of the absence of a port. Therefore an ideal transformer is used to break into the circuit at a convenient point, in this case, between the negative resistance circuit and resonator. It is important to note that this element is used for simulation purposes only, and is not part of the final oscillator circuit. Being ideal it does not affect the input impedance at its point of insertion.

Fig.2 Schematic for Small-Signal Oscillator Design

The first step in the design process is to ensure adequate (small-signal) negative resistance to allow oscillation to begin, and build into a steady-state. It is clear that capacitor values of 2.7 pF in the Clapp capacitive divider result in a magnitude of input reflection coefficient at 1.4 GHz.. This is more than enough to ensure that oscillation will begin.

Fig.3 Result of Small-Signal Negative Resistance

Simulation The complete closed loop oscillator circuit is next analysed (small-signal) by observing the input impedance at the ideal transformer. The oscillation condition is solved by looking for frequencies where the imaginary part of the impedance goes through zero, whilst maintaining an excess negative resistance. It can be seen that the imaginary part goes through zero at two frequencies, namely 1.35 GHz and 2.7 GHz. However, there is no net negative resistance at 2.7 GHz, while at 1.35 GHz there is some –70 . Thus with the component values we have designed a circuit capable of oscillating at approximately 1.35 GHz.

III.MIXER DESIGN

RF mixer is an essential part of wireless communication systems. Modem wireless communication systems demand stringent dynamic range requirements. The dynamic range of a receiver is often limited by the first down conversion mixer. This force many compromises between figures of merit such as conversion gain, ~ linearity, dynamic range, noise figure and port-to-port isolation of the mixer. Integrated mixers become more desirable than discrete ones for higher system integration with cost and space savings. In order to optimize the overall system performance, there exists a need to examine the merits and shortcomings of each mixer feasible for integrated solutions. Since balanced mixer designs are more desirable in todays integrated receiver designs due to its lower spurious outputs, higher common-mode noise rejection and higher port-to-port isolation, only balanced type mixers are discussed here.

VCCC‘08

20

Fig.4 Schematic of a Single-Balanced Mixer

The design of a single-balanced mixer is discussed here. The single-balanced mixer shown in Fig(4) is the simplest approach that can be implemented in most semiconductor processes. The single balanced mixer offers a desired single-ended RF input for ease of application as it does not require a balun transformer at the input. Though simple in design, it has moderate gain and low noise figure. However, the design has low 1dB compression point, low port-to-port isolation, low input ip3 and high input impedance.

IV.DESIGN OF IMPEDANCE MATCHING

CIRCUITS

Some graphic and numerical methods of impedance matching will be reviewed here with refererence to high frequency power amplifiers. Although matching networks normally take the form of filters and therefore are also useful to provide frequency discrimination, this aspect will only be considered as a corollary of the matching circuit.

(A) Matching networks using quarter-wave transformers

At sufficiently high frequencies, where /4-long lines of practical size can be realized, broadband transformation can easily be accomplished by the use of one or more /4-sections. Figure(5) summarizes the main relations for (a) one-section and (b) two-section transformation. A compensation network can be realized using a /2-long transmission line.

Fig.5 Transformation Networks Using l/4-Long Transmission Lines

Figures (6) and (7) show the selectivity curves for

different transformation ratios and section numbers. (B) Exponential lines

Exponential lines have largely frequency independent transformation properties. The characteristic impedance of such lines varies exponentially with their length I

Z = Z0.ekl

where k is a constant, but these properties are preserved only if k is small.

Fig.6 Selectivity Curves for Two /4-Section Networks at Different Transformation Ratios

Fig.7 Selectivity Curves for One, Two and Three /4-Sections

V. CONCLUSION

This paper presented the results of on-going work done in designing and developing a library of 0.18 and 0.25 CMOS cells for RF applications. The higher operating frequency ranges are expected to occur with 0.18 CMOS cells. The 1000 MHz upper

VCCC‘08

21

frequency is important because it includes several commercial communications bands. The design goals were met for all the cell library elements. Designed amplifier can be used to improve with on-off function by connecting the gate of the transistor M2 through a switch to the Vdd or ground. Although, not fully integrated on chip, this architecture is a good solution for multistandard systems, which operates at different frequency bands. In reality, the small-signal simulation is vital to ensure that adequate negative resistance is available for start-up of oscillation. With the emergence of new and more advanced semiconductor processes, the proper integrated mixer circuit topology with the highest overall performance can be devised and implemented.

REFERENCES

[1] D. Jakonis, K. Folkesson, J. Dabrowski, P. Eriksson, C. Svensson, "A 2.4-GHz RF Sampling Receiver Front-End in 0.18um CMOS", IEEE Journal of Solid-State Circuits, Volume 40, Issue 6, June 2005, PP. 1265-1277.

[2] C. Ruppel, W. Ruile, G. Schall, K. Wagner, and 0. Manner, "Review of Models for Low-Loss Filter Design and Applications," IEEE Ultrasonics Symp. Prac. 1994, pp. 313-324.

[3] H. T. Ahn, and D. J. Allstot, “A 0.5-8.5-GHz fully differential CMOS distributed amplifier,” IEEE J. Solid-State Circuits, vol. 37, no. 8, pp. 985-993, Aug 2002.

[4] B. Kleveland, C. H. Diaz, D. Vook, L. Madden, T. H. Lee, and S. S. Wong, ”Exploiting CMOS Reverse Interconnect Scaling in Multigigahertz Amplifier and Oscillator Design,” IEEE J. Solid-State Circuits, vol. 36, no. 10, pp. 1480-1488, Oct 2001

[5] M. Yamaguchi, and K.-I. Arai, "Current status and future prospects of RF integrated inductors", J. Magn. Soc. Jpn., vol.25, no.2, pp.59-65 2001.

[6] N. Ratier, M. Bruniaux, S. Galliou, R. Brendel, J. Delporte, "A very high speed method to simulate quartz crystal oscillator" Proc. of the 19th EFTF. Besan,on, March 2005, to be published.

[7] J. Park, C.-H. Lee, B. Kim, and J. Laskar, “A lowflicker noise CMOS mixer for direct conversion receivers,” presented at the IEEE MTT-S Int. Microw. Symp., Jun. 2006.

[8] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits. Cambridge, United Kingdom: Cambridge University Press, 2002.

[9] D. Pienkowski, R. Kakerow, M. Mueller, R. Circa, and G. Boeck, “Reconfigurable RF Receiver Studies for Future Wireless Terminals,” Proceedings of the European Microwave Association, 2005, vol. 1, June 2005.

[10] S. Camerlo, "The Implementation of ASIC Packaging, Design, and Manufacturing Technologies on High Performance Networking Products," 2005 Electronic Components and Technology Conference Proceedings, June 2005, pp. 927-932.

[11] J. Grad, J. Stine, “A Standard Cell Library for Student Projects”, Technical Report Illinois Institute of Technology 2002, http://www.ece.iit.edu/cad/scells

[12] D. Stone, J. Schroeder, R. Kaplan, and A. Smith, “Analog CMOS Building Blocks for Custom and Semicustom Applications”, IEEE JSSC, Vol. SC-19, No. 1, February, 1984.

VCCC‘08

22

A High-Performance Clustering VLSI Processor Based On The Histogram Peak-Climbing Algorithm

I.Poornima Thangam, II M.E. (VLSI Design)

[email protected], M.Thangavel M.E., (Ph.D).,

Professor, Dept. of ECE, K.S.Rangasamy College of Technology,

Tiruchengode, Namakkal, District.

Abstract –In computer vision systems, image feature separation is a very difficult and important step. The efficient and powerful approach is to do unsupervised clustering of the resulting data set. This paper presents the mapping of the unsupervised histogram peak-climbing clustering algorithm to a novel high-speed architecture suitable for VLSI implementation and real-time performance. It is the first special- purpose architecture that has been proposed for this important problem of clustering massive amounts of data which is a very computationally intensive task and the performance is improved by making the architecture truly systolic. The architecture has also been prototyped using a Xilinx FPGA development environment. Key words – Data clustering, systolic architecture, peak climbing algorithm, VLSI design, FPGA implementation

I.INTRODUCTION

As new algorithms are developed using a paradigm of off-line non real-time implementation, often there is a need to adapt and advance hardware architectures to implement algorithms in a real-time manner if they are to truly serve a useful purpose in industry and defense. This paper presents a high performance, systolic architecture [2], [3], [4] for the important task of unsupervised data clustering. Special attention is paid to the clustering of information rich features used for color image segmentation, and “orders of magnitude” performance increase from current implementation on a generic compute platform. Clustering for image segmentation

Special attention is given for the image segmentation application in this proposed architecture. The target segmentation algorithm described in [5], relies on scanning images or video frames with a sliding window and extracting features from each window. The texture, or the pixels bounded by each window, is characterized using mathematically modeled features. Once all features are extracted from the sliding windows, they are clustered in the feature space. The mapping of the identified clusters back

into the image domain results in the desired segmentation. Components of a Clustering Task Typical pattern clustering activity involves the steps: Pattern representation (optionally including feature extraction and/or selection), definition of pattern proximity measure appropriate to the data domain, clustering or grouping, Data abstraction (if needed), and Assessment of output (if needed) [4] as shown in Fig.1.

Fig.1 Stages in Data Clustering

II.SIGNIFICANCE OF THE PROPOSED

ARCHITECTURE

The main consideration for the implementation of the clustering algorithm as a dedicated architecture is its simplicity and highly nonparametric nature where very few inputs are required into the system. These characteristics lend the algorithm to implementation in an FPGA environment, so it can form part of a flexible and reconfigurable real-time computing platform for video frame segmentation. This system is depicted in Fig.2, and it can also serve as a rapid prototyping image segmentation platform. This is the first special-purpose architecture that has been proposed for the important problem of clustering massive amounts of feature data generated during the real-time segmentation [1]. During image segmentation, it is highly desirable to be able to

VCCC‘08

23

choose the best fitting features and/or clustering method based on problem domain [6], type of imagery, or lighting conditions.

Fig. 2. Reconfigurable real-time computing platform

for video frame segmentation.

III. HISTOGRAM PEAK-CLIMBING ALGORITHM This section describes the clustering algorithm implemented in this work. The Histogram peak-climbing approach is used for clustering features extracted from the input image. A. Histogram Generation

Given M features f of dimensionality N to be

clustered, the first step is to generate Histogram of N dimensions [5], [7]. This histogram is generated by quantizing each dimension according to the following equations:

for each of the M f (k) feature members, where

N dimensions of the features; CS (k) length of the histogram cell in the kth dimension; f max(k) maximum value of the kth dimension of the features; f min(k) minimum value of the kth dimension of the M features; Q total number of quantization levels for each dimension of the N-dimensional histogram; dk index for a histogram cell in the kth dimension associated with a given feature f.

Since the dynamic range of the vectors in each dimension can be quite different, the cell size for each dimension could be different. Hence, the cells will be hyper-boxes. This provides efficient dynamic range management of the data, which tends to enhance the quality and accuracy of the results. Next, the number of feature vectors falling in each hyper-box is counted and this count is associated with the respective hyper-box, creating the required histogram [8]. B. Peak-climbing Approach

After the histogram is generated in the

feature space, a peak-climbing clustering approach is utilized to group the features into distinct clusters [9]. This is done by locating the peaks of the histogram.

Fig. 3. Illustration of the peak climbing approach for a two dimensional feature space example.

In Fig.3, this peak-climbing approach is illustrated for a two-dimensional space example. A peak is defined as being a cell with the largest density in the neighborhood. A peak and all the cells that are linked to it are taken as a distinct cluster representing a mode in the histogram. Once the clusters are found, these can be mapped back to the original data domain from which the features were extracted. Features

VCCC‘08

24

grouped in the same cluster are tagged as belonging to the same category. IV. HIGH PERFORMANCE DATA CLUSTERING

ARCHITECTURE

Fig.4 shows the different steps of this implementation of the clustering algorithm and the overall architecture. The chosen architecture follows a globally systolic partition For the hardware implementation, the dataset to be clustered and Q are inputs to the system, and the clustered data and the number of clusters found are outputs of the system.

Fig.4. Peak climbing clustering algorithm overall architecture

VCCC‘08

25

A. Overall Steps

The main steps performed by this architecture are:

A. Feature selection / Extraction B. Inter pattern similarity C. Clustering / Grouping

i. Histogram Generation ii. Identifying Peaks

iii. Finding the peak indices iv. Link Assignment

The input image is digitized and the features (intensity here) of each pixel are found.Then the inter pattern similarity is identified by calculating the number of pixels with equal intensity. The clustering of features is done by four steps, namely, Generation of Histogram, Peak Identification, Finding the corresponding peak index for each and every peak [10], [11] and at last the Link assignment by setting a threshold value by which the features are clustered. B. Architectural Details This section presents the architectural details of the processor. Fig.5 shows the PE for the operation of finding the minimum and maximum values for each dimension of the feature vectors in the data set. N Min-Max PEs are instantiated in parallel, one for each dimension. The operations to find the minimum and maximum values are run sequentially, thus, making use of a single MIN/MAX cell in the PE. Fig.6 shows the details of the PE to compute the Cell Size CS(k) for each dimension. N PEs are instantiated in parallel; one for each dimension. Because of the high dimensionality of Random Field models, the number of quantization levels in each dimension necessary for effective and efficient clustering is very small ,that is Q = 3 … 8. This allows the division operation of Equation 1 to be implemented by a multiplication by the inverse of Q stored in a small look-up table (LUT).

Fig.5. Min-Max processing element

Fig.6. Cell size CS(k) processing element Fig.7 shows the details of the PE to compute histogram indexes for each data vector. N PEs are instantiated in parallel; one for each dimension.

Fig.7. Index processing element

Fig.8 shows the details of the PE to allocate and identify a data vector with a given histogram bin. One instantiation per each possible bin is made. The purpose of the compressor is to count the number of ones from the comparators, which corresponds to the density of a given bin in the histogram.

The rest of the micro-architecture to establish

the links between the histogram bins, and assign the clusters, so that the results can be output, follows a very similar structure as Fig.8. The only notable exception is that the PE uses a novel computational cell to calculate the norm between two 22 dimensional vectors. This cell is shown in Fig.9.

VCCC‘08

26

Fig.8. Processing Element used to allocate vectors

to histogram bins

Fig. 9. Neighbor detector

V. FPGA IMPLEMENTATION

The architecture has been prototyped using a Xilinx FPGA development environment. The issue of cost effectiveness for FPGA implementation is somewhat secondary for reconfigurable computing platforms. The main advantage of FPGAs is its flexibility. The target device here is Virtex- II XC2V1000.

VI. RESULTS

This paper describes a high performance

VLSI architecture for the clustering of high dimensionality data. This architecture can be used in many military, industrial, and commercial applications that require real-time intelligent machine vision processing. However, the approach is not limited to this type of signal processing only, but it can also be applied to other types of data for other problem domains, for which the clustering process needs to be accelerated. In this paper, the performance of the processor has been improved by making the architecture systolic.

VII. FUTURE WORK In future, there is possibility of processing data in floating-point format, and also the implementation of the architecture across several FPGA chips to address larger resolutions.

VCCC‘08

27

REFERENCES [1] O. J. Hernandez, “A High-Performance VLSI Architecture for the Histogram Peak-Climbing Data Clustering Algorithm,” IEEE Trans. Very Large Scale Integr.(VLSI) Syst.,vol.14,no.2,pp. 111-121, Feb. 2006. [2] M.-F. Lai, C.-H. Hsieh and Y.-P. Wu, “A VLSI architecture for clustering analyzer using systolic arrays,” in Proc. 12th IASTED Int. Conf. Applied Informatics, May 1994, pp. 260–260. [3] M.-F. Lai, Y.-P. Wu, and C.-H. Hsieh, “Design of clustering analyzer based on systolic array architecture,” in Proc. IEEE Asia-Pacific Conf. Circuits and Systems, Dec. 1994, pp. 67–72. [4] M.-F. Lai, M. Nakano, Y.-P.Wu, and C.-H. Hsieh, “VLSI design of clustering analyzer using systolic arrays,” Inst. Elect. Eng. Proc.: Comput. Digit. Tech., vol. 142, pp. 185–192, May 1995. [5] A. Khotanzad and O. J. Hernandez, “Color image retrieval using multispectral random field texture model and color content features,” Pattern Recognit. J., vol. 36, pp. 1679–1694, Aug. 2003. [6] M.-F. Lai and C.-H. Hsieh, “A novel VLSI architecture for clustering analysis,” in Proc. IEEE Asia Pacific Conf. Circuits and Systems, Nov. 1996, pp. 484–487. [7] O. J. Hernandez, “High performance VLSI architecture for data clustering targeted at computer vision,” in Proc. IEEE SoutheastCon, Apr. 2005, pp. 99–104. [8] A. Khotanzad and A. Bouarfa, “Image segmentation by a parallel, nonparametric histogram based clustering algorithm,” Pattern Recognit.J, vol. 23, pp. 961–963, Sep. 1990. [9] S. R. J. R. Konduri and J. F. Frenzel, “Non-linearly separable cluster classification: An application for a pulse-coded CMOS neuron,” in Proc.Artificial Neural Networks in Engineering Conf., vol. 13, Nov. 2003, pp. 63–67. [10]http://www.elet.polimi.it/upload/matteu /clustering /tutorial _html [11]http://en.wikipedia.org/wiki /cluster analysis

VCCC‘08

28

Reconfigurable CAM-Improving the Effectiveness of Data Access in ATM Networks

Sam Alex 1, B.Dinesh2, S. Dinesh kumar 2

1: Lecturer, 2: Students, Department of Electronics and Communication Engineering,

JAYA Engineering College,Thiruninravur , Near Avadi, Chennai-602024.

Email :- [email protected], [email protected], [email protected]

Abstract-Content addressable memory is an expensive component in fixed architecture systems however it may prove to be a valuable tool in online architectures (that is, run-time reconfigurable systems with an online decision algorithm to determine the next reconfiguration). Using an ATM, customers access their bank account in order to make cash withdrawals (or credit card cash advances) and check their account balances .The existing now has dedicated lines from a common server. In this paper we use the concept of IP address matching using reconfigurable content addressable memories(RCAM) using which, we replace the dedicated lines connected to the different ATMs by means of a single line, where every ATM is associated with an individual RCAM circuit. We implement the RCAM circuit using finite state machines. Thus we improved the efficiency with which data is accessed. We have also made efficient cable usage, bringing in less maintenance and making the overall design cheap. Keywords: Content addressable memory, ATM, RCAM, Finite state machines, look up table, packet forwarding.

I.INTRODUCTION

A Content-Addressable memory (CAM) compares input search data against a table of stored data, and returns the address of the matching data [1]–[5]. CAMs have a single clock cycle throughput making them faster than other hardware- and software-based search systems. CAMs can be used in a wide variety of applications requiring high search speeds. These applications include parametric curve extraction [6], Hough transformation [7], Huffman coding/decoding [8], [9], Lempel–Ziv compression [10]–[13], and image coding[14]. The primary commercial application of CAMs today is to classify and forward Internet protocol (IP) packets in network routers [15]–[20]. In networks like the Internet, a message such an as e-mail or a Web page is transferred by first breaking up the message into small data packets of a few hundred bytes, and, then, sending each data packet individually through the network. These packets are routed from the source, through the intermediate nodes of the network (called routers), and reassembled at the destination to

reproduce the original message. The function of a router is to compare the destination address of a packet to all possible routes, in order to choose the appropriate one. A CAM is a good choice for implementing this lookup operation due to its fast search capability. In this paper, we present a novel architecture for a content addressable memory that provides arbitrary tag and data widths. The intention is that this block would be incorporated into a Field- Programmable Gate Array or a Programmable Logic Core, however, it can be used whenever post-fabrication flexibility of the CAM is desired.[21].

II.CONTENTADDRESSABLEMEMORIES

Content Addressable Memory (CAM) is hardware search engines that are much faster than algorithmic approaches for search-intensive applications. They are a class of parallel pattern matching circuits. In one mode these circuits operate like standard memory circuits and may be used to store binary data. Unlike standard memory circuits, a powerful match mode is available. This provides all of the data in CAM to be searched in parallel. In the match mode, each memory cell in the array is accessed in parallel and compared to some value. If value is found, match signal is generated. In some implementations, all that is significant is match for the data is found. In other cases, it is desirable to know exactly where in the memory address space, this data was located. Rather than producing simple match signal, CAM supplies the address of the matching data. CAM compares input search data against a table of stored data, and returns the address of the matching data. They have a single clock cycle throughput making them faster than other hardware and software-based are-search systems. CAMs can be used in a wide variety of applications requiring high search speeds. These applications include parametric curve extraction, Hough transformation, Huffman coding/decoding, Lempel– Ziv compression, and image coding.

Fig.1.Block Diagram of a Cam Cell

VCCC‘08

29

A. Working of CAM Writing to a CAM is exactly like writing to a conventional RAM. However, the “read” operation is actually a search of the CAM for a match to an input "tag." In addition to storage cells, the CAM requires one or more comparators. Another common scheme involves writing to consecutive locations of the CAM as new data is added. The outputs are a MATCH signal (along with an associated MATCH VALID signal) and either an encoded N-bit value or a one-hot-encoded bus with one match bit corresponding to each CAM cell.

The multi-cycle CAM architecture tries to find a match to the input data word by simply sequencing through all memory locations – reading the contents of each location, comparing the contents to the input value, and stopping when a match is found. At that point, MATCH and MATCHVALID are asserted. If no match is found, MATCH is not asserted, but MATCH VALID is asserted after all addresses are compared. MATCH VALID indicates the end of the read cycle. In other words, MATCH VALID asserted and MATCH not asserted indicates that all the addresses have been compared during a read operation and no matches were found.

When a match is found, the address of the matching data is provided as an output and the MATCH signal is asserted. It is possible that multiple locations might contain matching data, but no checking is done for this. Storage for the multi-cycle CAM can be either in distributed RAM (registers) or block RAM. M bits M-1 bits Last data word of previous entry 0 First tag word 1 Second tag word 0 Third tag word 0 First data word 0 Second data word 0 Third data word 0 Fourth data word 0 First tag word of next entry 1 One entry with 3*(M-1) tag bits and 4*(M-1) data bits Fig.2. Cam Storage Organization

In a typical CAM, each memory word is divided into two parts. 1. A tag field .2.An address field. Each tag field is associated with one comparator. Each comparator compares the associated tag with the input tag bits, and if a match occurs, the corresponding data bits are driven out of the memory. Although this is fast, it is not suitable for applications with a very wide tag or data width. Wide tags lead to large comparators, which are area inefficient, power hungry, and often slow.

B. Packet Forwarding Using Cam

We describe the application of CAMs to packet forwarding in network routers. First, we briefly summarize packet forwarding and then show how a CAM implements the required operations. Network routers forward data packets from an incoming port to an outgoing port, using an address-lookup function. The address-lookup function examines the destination address of the packet and selects the output port associated with that address.The router maintains a list, called the routing table that contains destination addresses and their corresponding output ports. An example of a simplified routing table is displayed in TableI.

Example Routing Table

Entry No.

Address(Binary) Output port

1 101XX A 2 0110X B 3 011XX C 4 10011 D

Fig.3. Simple routing table CAM RAM

Fig. 4. CAM based implementation of he routing table.

All four entries in the table are 5-bit words, with the don’t car care bit “X”, matching both a 0 and a 1 in that position. Due to the “X” bits, the first three entries in the Table represent a range of input addresses, i.e., entry 1 maps all addresses in the range 10100 to 10111 to port A. The router searches this table for the destination address of each incoming packet, and selects the appropriate output port.

For example, if the router receives a packet With the destination address 10100, the packet is forwarded to port A. In the case of the incoming address 01101, the address lookup matches both entry 2 and entry 3 in the table. Entry 2 is selected since it has the fewest “X” bits, or alternatively it has the longest prefix, indicating that it is the most direct route to the destination. This lookup method is called longest-prefix matching. Fig illustrates how a CAM accomplishes address lookup by implementing the routing table shown in Table able I. On the left of Fig, the packet destination-

1 0 1 X X0 1 1 0 X0 1 1 X X1 0 0 1 1

00 port A01 port B 10 port C 11 port D

01

deco

der

enco

der

0 1 1 0 1 Port B

VCCC‘08

30

address of 01101 is the input to the CAM. As in the table, two locations match, with the (priority) encoder choosing the upper entry and generating the match location 01, which corresponds to the most-direct route. This match location is the input address to a RAM that contains a list of output ports, as depicted in Fig. A RAM read operation outputs the port designation, port B, to which the incoming Packet is forwarded. We can view the match location output of the CAM as a pointer that retrieves the associated word from the RAM. In the particular case of pack packet forwarding the- associated word is the designation of the output port. This CAM/RAM System is a complete implementation of an address-lookup engine for packet forwarding.

III. THE RECONFIGURABLE CONTENT

ADDRESSABLE MEMORY (RCAM)

The Reconfigurable Content Addressable Memory or RCAM makes use of run-time reconfiguration to efficiently implement a CAM circuit. Rather than using the FPGA flip-flops to store the data to be matched, the RCAM uses the FPGA Look up Tables or LUTs. Using LUTs rather than flip-flops results in a smaller, faster CAM. The approach uses the LUT to provide a small piece of CAM functionality. In Figure, a LUT is loaded with data which provides\match 5" functionality. That is, whenever the binary encoded value \5" is sent to the four LUT inputs, a match signal is generated. 4 input LUT Fig. 5. A simple Look up Table.

Note that using a LUT to implement CAM functionality, or any functionality for that matter, is not unique. An N-input LUT can implement any arbitrary function of N inputs; including a CAM. This circuit demonstrates the ability to embed a mask in the configuration of a LUT, permitting arbitrary disjoint sets of values to be matched, within the LUT. This function is important in many matching applications, particularly networking. This approach can be used to provide matching circuits such as match all or match none or any combination of possible LUT values. One currently popular use for CAMs is in networking. Here data must be processed under demanding real-time constraints. As packets arrive, their routing information must be processed. In particular, destination addresses, typically in the form of 32-bit Internet Protocol (IP) addresses must be

classified. This typically involves some type of search. Current software based approaches rely on standard search schemes such as hashing. This results in savings not only in the cost of the processor itself, but in other areas such as power consumption and overall system cost. In addition, an external CAM provides networking hardware with the ability to achieve packet processing in essentially constant time. Provided all elements to be matched fit in the CAM circuit, the time taken to match is independent of the number of items being matched.

Fig.6. IP Match circuit using the RCAM.

Figure above shows an example of an IP Match circuit constructed using the RCAM approach. Note that this example assumes a basic 4-input LUT structure for simplicity. Other optimizations, including using special-purpose hardware such as carry chains are possible and may result in substantial circuit area savings and clock speed increases.

This circuit requires one LUT input per matched bit. In the case of a 32-bit IP address, this circuit requires 8 LUTs to provide the matching, and three additional 4-input LUTs to provide the ANDing for the MATCH signal. An array of this basic 32-bit matching block may be replicated in an array to produce the CAM circuit.

IV. FINITE STATE MACHINES

If a combinational logic circuit is an implementation of a Boolean function, then, sequential circuit can be considered an implementation of finite state machine. The goal of FSM is not accepting or rejecting things, but generating a set of outputs, given, a set of inputs. They describe how the inputs are being processed, based on input and state, to generate outputs. This FSM uses only entry actions, that is, output depends only on the state. In a Moore finite state machine, the output of the circuit is dependent only on the state of the machine and not its inputs. This FSM uses only input actions,

0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

… 000100

LLLLLLL

& & &

VCCC‘08

31

that is, output depends on input and state. The usage of this machine results in the reduction in the number of states. This FSM uses only input actions, that is, output depends on input and state. The usage of this machine results in the reduction in the number of states.

In a Mealy Finite state machine, the output is dependent both on the machine state as well as on the inputs to the finite state machine. Notice that in this case, outputs can change asynchronously with respect to clock.

One of the best ways of describing a Mealy finite state machine is by using two always statements, on be for describing the sequential logic, and one for describing the combinational logic (this includes both next state logic and output logic). It is necessary top do this since any changes on inputs directly the outputs used to describe combinational logic, the state of the machine using a reg variable.

V. THE AUTOMATED TELLER MACHINES

An ATM is also known, in English, as Automated Banking Machine, Money machine, Bank Machine

A. Usage

Encrypting PIN Pad (EPP) with German markings on most modern ATMs, the customer identifies him or herself by inserting a plastic card with a magnetic stripe or a plastic smartcard with a chip, that contains his or her card number and some security information, such as an expiration date or CVC (CVV). The customer then verifies their identity by entering a pass code, often referred to as a Personal Identification Number

B. Types of ATMs Mono-function devices, which only one type of mechanism for financial transactions is present (such as cash dispensing or statement printing) Multi-function devices, which incorporate multiple mechanisms to perform multiple services (such as accepting deposits, dispensing cash, printing statements, etc.) all within a single footprint. There is now a capability (in the U.S.and Europe, at least) for no-envelope deposits with a unit called a Batch- or Bunch-Note Acceptor, or BNA, that will accept up to 40 bills at a time. There is another unit called a Cheque Processing Machine, or Module, (CPM) that will accept your Cheque, take a picture of both sides, read the magnetic ink code line which is at the bottom of every Cheque, read the amount written on the Cheque, and capture your Cheque into a bin, giving you instant access to your money, if your account allows.

There are two types of ATM installations, on and off premise. On premise ATMs are typically more advanced, multi-function machines that complement an actual bank branch's capabilities and thus more expensive. Off premise machines are deployed by financial institutions and also ISOs (or Independent Sales Organizations) where there is usually just a straight need for cash, so they typically are the cheaper mono-function devices. C. Hardware An ATM typically is made up of the following devices. CPU (to control the user interface and transaction devices), Magnetic and/or Chip card reader (to identify the customer), PIN Pad (similar in layout to a Touch tone or Calculator keypad), often manufactured as part of a secure enclosure, Secure crypto processor, generally within a secure enclosure, Display (used by the customer for performing the transaction). Function key buttons (usually close to the display) or a Touch screen (used to select the various aspects of the transaction), Record Printer (to provide the customer with a record of their transaction) , Vault (to store the parts of the machinery requiring restricted access), Housing (for aesthetics and to attach signage to). Cheque Processing Module ,Batch Note Acceptor .Recently, due to heavier computing demands and the falling price of computer-like architectures, ATMs have moved away from custom hardware architectures using microcontrollers and/or application-specific integrated circuits to adopting a hardware architecture that is very similar to a personal computer.

Many ATMs are now able to use operating systems such as Microsoft Windows and Linux. Although it is undoubtedly cheaper to use commercial off-the-shelf hardware, it does make ATMs vulnerable to the same sort of problems exhibited by conventional computers.

D. Future ATMs were originally developed as just cash dispensers; they have evolved to include many other bank-related functions. ATMs can also act as an advertising channel for companies to advertise their own products or third-party products and services.

VI. IMPLEMENTATION

Fig.9.Current ATM System

ATM ATM ATM

SERV

VCCC‘08

32

Having seen the details about the current ATM systems, we can understand that, these systems have dedicated lines from their servers.In the current system, the server checks all the users’ information and then give the details of the user who inserted.

VII. PROPOSED SYSTEM WITH RCAM

Fig. 10. Proposed system with RCAM

The data packets coming from the server are available to all the RCAMs simultaneously. The packets are formatted to have a 32 bit IP address followed by 10 bits of data. All the RCAMs receive the data packets (DP) simultaneously. The IP addresses of the data packets are traced by all the four RCAM circuits. The RCAM performs its matching and determines if the IP address of the packet matches with the IP address of the ATM. In case the matching occurs with the address of the first ATM, then, the following 10 bits of data are taken by the first ATM alone. If there is a mismatch in the IP address, the following 10 bits of data are not taken.

We initially set up a CAM array, consisting of four CAMS, cam0, cam1, cam2, and cam3.(for each of the four ATMs taken as a sample) In each of the elements of the CAM array, we have variables ranging from 0 to 31, in order to take up the 32 bit IP address that would be fed during the runtime. It would be like, cam0 (0) cam0 (1)….cam0 (31), similarly for all the four Cams. During the runtime, we decide the IP address of the ATMs and force it into the channels. We also send the 10 bits of data, following the IP address. The ‘dsel’ pin is set up such that, if the IP address if forced on channel for cam 0, the ‘dsel’ becomes 0001 so that, the data that is transmitted appears as output to ‘tout 0’pin. Similarly, when address for cam 1 is forced, ‘dsel’ becomes 0010, such that, data transmitted appears as output to ‘tout 1’ pin.

VII. RESULTS

Fig.11.Block Diagram

Fig.12. the first cam IP address is read

ATM 1

ATM 2

ATM 3

ATM 4

RCAM

RCAM

RCAM

RCAM

DP D

SERVER

VCCC‘08

33

Fig.13.The Data for First Cam Is Read

Fig.14.The Address of Two Cams are Taken.

VIII. CONCLUSION

Today, advances in circuit technology permit large CAM circuits to be built. However, uses for CAM circuits are not necessarily limited to niche applications like cache controllers or network routers. Any application which relys on the searching of data can benefit from a CAM-based approach.

In addition, the use of parallel matching hardware in the form of CAMs can provide another more practical benefit. For many applications, the use of CAM- based parallel search can offload much of the work done by the system processor. This should permit smaller, cheaper and lower power processors to be used in embedded applications which can make use of CAM-based parallel search. The RCAM is a flexible, cost-effective alternative to existing CAMs. By using FPGA technology and run-time reconfiguration, fast, dense CAM circuits can be easily constructed, even at run-time. In addition, the size of the RCAM may be tailored to a particular hardware design or even temporary changes in the system. This flexibility is not available in other CAM solutions. In addition, the RCAM need not be a stand-alone implementation. Since the RCAM is entire a software solution using state of the art FPGA hardware, it is quite easy to embed RCAM functionality in larger FPGA designs. Finally, we believe that existing applications, primarily in the field of network routing,

are just the beginning of RCAM usage. Once other applications realize that simple, fast, flexible parallel matching is available, it is likely that other applications and algorithms will be accelerated.

REFERENCES

[1] T. Kohonen, Content-Addressable Memories, 2nd ed. New York: Springer-Verlag, 1987. [2] L. Chisvin and R. J. Duckworth, “Content-addressable and associative memory: alternatives to the ubiquitous RAM,” IEEE Computer, vol. 22, no. 7, pp. 51–64, Jul. 1989. [3] K. E. Grosspietsch, “Associative processors and memories: a survey,” IEEE Micro, vol. 12, no. 3, pp. 12–19, Jun. 1992. [4] I. N. Robinson, “Pattern-addressable memory,” IEEE Micro, vol. 12, no. 3, pp. 20–30, Jun. 1992. [5] S. Stas, “Associative processing with CAMs,” in Northcon/93 Conf. Record, 1993, pp. 161–167. [6] M. Meribout, T. Ogura, and M. Nakanishi, “On using the CAM concept for parametric curve extraction,” IEEE Trans. Image Process., vol. 9, no. 12, pp. 2126–2130, Dec. 2000. [7] M. Nakanishi and T. Ogura, “Real-time CAM-based Hough transform and its performance evaluation,” Machine Vision Appl., vol. 12, no. 2, pp. 59–68, Aug. 2000. [8] E. Komoto, T. Homma, and T. Nakamura, “A high-speed and compactsize JPEG Huffman decoder using CAM,” in Symp. VLSI Circuits Dig Tech. Papers, 1993, pp. 37–38. [9] L.-Y. Liu, J.-F.Wang, R.-J.Wang, and J.-Y. Lee, “CAM-based VLSI architectures for dynamic Huffman coding,” IEEE Trans. Consumer Electron., vol. 40, no. 3, pp. 282–289, Aug. 1994. [10] B. W. Wei, R. Tarver, J.-S. Kim, and K. Ng, “A single chip Lempel-Ziv data compressor,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 3, 1993, pp. 1953–1955. [11] R.-Y. Yang and C.-Y. Lee, “High-throughput data compressor designs using content addressable memory,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), vol. 4, 1994, pp. 147–150. [12] C.-Y. Lee and R.-Y. Yang, “High-throughput data compressor designs using content addressable memory,” IEE Proc.—Circuits, Devices and Syst., vol. 142, no. 1, pp. 69–73, Feb. 1995. [13] D. J. Craft, “A fast hardware data compression algorithm and some algorithmic extansions,” IBM J. Res. Devel., vol. 42, no. 6, pp. 733–745, Nov. 1998. [14] S. Panchanathan and M. Goldberg, “A content-addressable memory architecture for image coding using vector quantization,” IEEE Trans.Signal Process., vol. 39, no. 9, pp. 2066–2078, Sep. 1991.

VCCC‘08

34

[15] T.-B. Pei and C. Zukowski, “VLSI implementation of routing tables: tries and CAMs,” in Proc. IEEE INFOCOM, vol. 2, 1991, pp. 515–524. [16] , “Putting routing tables in silicon,” IEEE Network Mag., vol. 6, no.1, pp. 42–50, Jan. 1992. [17] A. J. McAuley and P. Francis, “Fast routing table lookup using CAMs,” in Proc. IEEE INFOCOM, vol. 3, 1993, pp. 1282–1391. [18] N.-F. Huang, W.-E. Chen, J.-Y. Luo, and J.-M. Chen, “Design of multi-field IPv6 packet classifiers using ternary CAMs,” in Proc. IEEE GLOBECOM, vol. 3, 2001, pp. 1877–1881. [19] G. Qin, S. Ata, I. Oka, and C. Fujiwara, “Effective bit selection methods for improving performance of packet classifications on IP routers,” in Proc. IEEE GLOBECOM, vol. 2, 2002, pp. 2350–2354. [20] H. J. Chao, “Next generation routers,” Proc. IEEE, vol. 90, no. 9, pp. 1518–1558, Sep. 2002. [21] C.J. Jones, S.J.E. Wilton, "Content addressable Memory with Cascaded Match, Read and Write Logic in a Programmable Logic Device", U.S. Patent 6,622,204. Issued Sept. 16, 2003, assigned to Cypress Semiconductor Corporation.

VCCC‘08

35

Abstract The paper describes the architecture and design of the pipelined execution unit of a 32-bit RISC processor. Organization of the blocks in different stages of pipeline is done in such a way that pipeline can be clocked at high frequency. Control and forward of 'data flow' among the stages are taken care by dedicated hardware logic. Different blocks of the execution unit and dependency among themselves are explained in details with the help of relevant block diagrams. The design has been modeled in VHDL and functional verification policies adopted for it have been described thoroughly Synthesis of the design is carried out at 0.13-micron standard cell technology. Keywords: ALU, Pipeline, RISC, VLSI, Multistage

I. INTRODUCTION

The worldwide development of high-end, sophisticated digital systems has created a huge demand for high speed, general-purpose processor. The performance of processors has increased exponentially since their launch in 1970. Today's high performance processors have a significant impact on the commercial marketplace. This high growth rate of the processors is possible due to dramatic technical advances in computer architecture, circuit design, CAD tools and fabrication methods. Different processor architectures have been developed and optimized to achieve better performance. RISC philosophy [1] has attracted microprocessor designers to a great extent .Most computation engines used these days in different segments like server, networking, signal processing are based on RISC philosophy [1].To cater to the needs of multi-tasking, multi-user applications in high-end systems, a 32-bit generic processor architecture, CORE-I, has been designed based on RISC philosophy [2].

II. PROCESSOR OVERVIEW

CORE-I is a 32-bit RISC processor with 6 stage pipelined execution unit based on load-store architecture. ALU supports both single-precision and double-precision floating-point operations. CORE-I Register-File has 45 General- Purpose Registers (GPRs), 19 Special-Purpose Registers

(SPRs) and 64 Double-Precision-Floating-Point-Unit (DPFPU) registers. The Instruction Set Architecture (ISA) has total 136 instructions. The processor has two modes of operation, user mode and supervisor mode (protected mode). Dependency-Resolver detects and resolves the data hazard within the pipeline. The execution unit is interfaced with instruction channel and data channel. Both the channels operate in parallel and Communicate with external devices through a common Bus Interface Unit (BIU). The instruction channel has a 128-bit Instruction Line Buffer, 64-KB Instruction Cache [1], a Memory Management Unit (MMU) [1] and a 128-bit Prefetch Buffer [3]. The data channel has a 128-bit Data Line Buffer, 64-KB Data Cache, a MMU and a Swap Buffer [3]. The Pre-fetch Buffer and Swap Buffer are introduced to reduce memory latency during instruction fetch and data cache misses respectively. The external data flow through the instruction channel and data channel is controlled by respective controller-state-machine. The processor also has seven interrupt-requests (IRQs) and one non-maskable interrupt (NMI) inputs. The Exception Processing Unit controls the interrupts and on-chip exceptions.

III. EXECUTION UNIT

CORE-I execution unit contains an ALU unit, a branch/jump unit, a single-precision floating-point unit (SPFPU) and a double-precision floating-point unit (DPFPU) [4]. The execution unit is implemented in six pipeline stages - IF (Instruction Fetch), ID (Instruction Decode), DS (Data Select), EX (Execution Unit), MEM (Memory) and WB (Write Back). The main blocks in different stages of the pipeline are shown in Figure1 A.Program Counter Unit (PCU):

Program Counter Unit provides the value of program Counter (PC) in every cycle for fetching the next instruction from Instruction Cache. In every cycle, it checks for presence of any branch instruction in MEM stage, jump instruction in EX stage, any interrupt or on-chip exceptions and presence of RTE (return from exception) instruction [2] inside the pipeline. In the absence of any one of

Design of Multistage High Speed Pipelined RISC Architecture

Manikandan Raju , Prof.S.Sudha Electronics and Communication Department (PG),

Sona College of Technology, Salem [email protected] , [email protected]

VCCC‘08

36

above conditions, the PCU either increments the PC value by 4 when pipeline is not halted or keeps the old value. This operation is performed in IF stage. B. Register File:

CORE-I has two separate register units - Primary Register Unit (PRU) and Double Precision Register Unit (DPRU). PRU contains 64 registers used by integer and single precision floating-point operations and DPRU contains two sets of 32 registers used for double-precession floating-point operations. All the registers are of 32bit width. There are 45 GPRs and 19 SPRs in PRU. Six-bit address of the registers

to be read is specified in fixed locations in the instruction. Performing register reading in a high-

speed pipelined execution unit is a time critical

design. In CORE-I, register reading is performed in two stages ID and DS. Three least significant bits of register address are used in ID stage to select 8 from the 64 registers and three most significant bits are used in the DS stage to select final register out of 8 registers.

Fig1.Six Stages of CORE-I Pipeline DPRU contains 64 registers arranged in two banks – each bank having 32 registers. The register banks are named as Odd Bank and Even Bank.Odd Bank contains all odd numbered registers viz. regl, reg3 up to reg63 and Even Bank contains the even numbered registers viz. reg0, reg2 up to reg62. This arrangement is made to provide two 64-bit operands to DPFPU from the registers simultaneously. DP instruction specifies only the address of even registers (e.g., r0, r2), which are read from the Even Bank. But their corresponding odd registers are also read and the whole 64- bit operand is prepared (e.g., (rl: r0), (r3: r2)). All the register reading is done in two clock cycles as mentioned earlier. Special instruction is used to transfer data between PRU and DPRU. The dependency between them is taken care by a separate Dependency Resolver.

C. ALU Unit: CORE-I has instructions to perform arithmetic operations like addition, subtraction, shifting, rotating, multiplication, single step division, bit set/reset, extend sign, character reading/writing etc. The operations are performed in EX stage.There are two operands to the execution unit. Each of the operand can take value either from the register content or forwarded value from EX, MEM or WB stage. So, a multiplexer (data input) in front of the ALU block selects the actual operand. Since all the computational blocks execute in parallel with the input in DS stage are generated in ID stage, data and produce 32-bit results, a multiplexer (ALU output) is also put after blocks to select the correct result. So, EX stage in a general pipelining scheme contains input data mux, operational block and output ALU mux. This is one of the critical timing paths to be handled in high-speed pipeline design. To clock the pipeline at high speed, in CORE-I the data selection for the computational blocks in EX stage is performed one stage ahead in DS stage. The multiplexed operand is latched and then fed to the operational blocks. In CORE-I, the ALU results multiplexing are done in MEM stage instead of EX stage. The main issue to be handled for this organizational change of pipeline occurs at the time of consecutive data dependent instructions in the pipeline. At the time of true data dependency [1] between two consecutive instructions, the receiving instruction has to wait one cycle in the DS stage, as the forwarding instruction has to reach MEM stage to produce the correct ALU output. Dependency Resolver controls the data forwarding. The data flow in the pipeline among DS, EX and MEM stages are shown in Figure-2. The address and data calculation for the load, store instructions are also performed in the ALU. D. Dependency Resolver:

The Dependency Resolver module resolves the data hazards [1] of the six-stage pipeline architecture of CORE-I. The true data dependency in CORE-I is resolved mainly by data forwarding. But in case of data dependency between two consecutive instructions, some stages of the pipeline have to be stalled for one cycle (as explained earlier in ALU section).This module handles both stalling [1] as well as data forwarding.

D.I Stalling Dependency Resolver checks the instructions in ID and DS stages and generates enable signal for stalling. In the next cycle this enable signal is used by logic (placed in DS stage) and produces Freeze signal. This Freeze signal stalls all the flops between IF/ID, ID/DS

VCCC‘08

37

and DS/EX stage. Below figure shows the stall enable and Freeze signal generation blocks in the ID, DS and EX stage. EX-MEM-WB stages are not stalled. So EX result moves to MEM and forwarded to DS D.II Forwarding

In CORE-I architecture data are forwarded from MEM,WB, WB+ and WB++ stages to DS stage. WB+ and WB++ are the stages used for forwarding only and contain the 'flopped' data of the previous stage for the purpose of forwarding. Generation of all the control signals for data forwarding as well as the actual transfer of data is time critical. Uniqueness of the CORE-I data forwarding is that, all the control signals used for the forwarding multiplexers, one clock cycle earlier. Then they are latched and used in DS stage as shown in below figure. The consequences of the early select signals generation are -

1. The forwarding instruction has to be checked one stage before the actual stage of forwarding. For example, to forward data from MEM stage, the instruction in the EX stage has to be checked with the receiving instruction.

2. The receiving instruction also has to be compared with the forwarding instruction one stage before.Receiving stage is always DS. In most of the situations the receiving instruction in DS, one clock cycle back remains in ID. So the ID stage instruction is compared with the forwarding instruction. But in case of successive dependency, when IF, ID and DS stages are stalled for one cycle, the receiving instruction, before the forwarding remains in the DS stage itself. In that case the DS instruction is compared with the forwarding instruction. For time constraint Dependency Resolver generates the control signals for both the cases and finally the correct one is selected in DS stage.

Fig2 . Stall Enable and Freeze signal generation Logic

E. Multi Cycle Unit:

Multiplication and floating-point operations are multi cycle in CORE-I. 16x16 multiplication takes

3 cycles and for other instructions like 32x32 multiplication, single-precision and double-precision floating-point operations, the number of cycles is programmable. It can be programmed through instructions or setting values in dedicated input ports. When the instruction reaches EX stage, the pipeline is frozen for required number of cycles

Fig3.Forwarding Scheme

F. Branch and Jump Unit:

CORE-I supports 14 conditional branch instructions, 2 jump instructions and 1 return instruction. Jump and return instructions are scheduled in EX stage, i.e. PC value is updated when the instruction is in EX stage. But for the conditional branch instructions, condition is evaluated in EX stage and PC value is updated in MEM stage. All these instructions have 3-delay slots [1]. G. Exception Processing Unit:

CORE-I has external seven IRQs, 1 NMI, and 1 PIO interrupt [2]. In addition to these external interrupts, the onchip interrupt controller serves SIO, Timer interrupts, onchip exceptions due to arithmetic operations, bus errors and illegal opcodes. CORE-I also supports software interrupts with TRAP instructions. The Interrupt Service Routine (ISR) address for the interrupt is calculated in EX stage and fed to the PCU. The return PC value and processor status word is saved in the interrupt stack pointer before transferring control to routine. At the time of exception processing, if higher priority interrupts come, interrupt controller serves the higher priority one.

IV. VERIFICATION & SYNTHESIS

The complete processor is modeled in verilog HDL. The syntax of the RTL design is checked using LEDA tool. For functional verification of the design, the processor is modeled in high-level language - System verilog [5]. The design is verified both at the block level and top level. Test cases for the block level are generated in System verilog by both directed and random way. For top-level verification, assembly programs are written and the corresponding hex code from the assembler is fed to both RTL design and model. The checker module captures and

VCCC‘08

38

compares the signals from both the model and displays the messages for mismatching of signal values. For completeness of the verification, different coverage matrices have been checked. The formal verification of the design is also planned. The design has been synthesized targeting 0.1 3micron standard cell technology using cadence PKS tool [6]. The complete design along with all timing constraints and optimization options are described using TCL script [7].When maximum timing corner in the library is considered, the synthesizer reports worst-path timing of 1.4ns in thewhole design. After synthesis, verilog netlist is verifiedfunctionally with the synthesizer generated 'sdf' file.

V. CONCLUSION

This paper has described the architecture of pipelined execution unit of 32-bit RISC processor.The design isworking at 714MHz after synthesis at 0.13micron technology. Presently the design is being carried through backend flow and chips will be fabricated after it. Our future work includes changing the core architecture to make it capable of handling multiple threads and supporting network applications effectively.

REFERENCES

[1] John L Hennessy, David A Patterson, Computer Architecture: A Quantitative Approach, Third Edition, Morgan Kaufmann Publishers, 2003 [2] ABACUS User Manual, Advanced Numerical Research and Analysis Group (ANURAG), DRDO [3] Norman P. Jouppi, Improving Direct-Mapped cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers, Digital Equipment Corporation Western Research Lab, 1990 IEEE [4] IEEE Standard for Binary Floating-Point Arithmetic, Approved March 21, 1985, IEEE Standards Board [5] Accellera' s Extensions to Verilog, System Verilog 3.1 a language Referecne manual [6] 'PKS User Guide', Product version 5.0.12, October 2003 [7] TCL and the Tk Toolkit

VCCC‘08

39

Monitoring Of an Electronic System Using Embedded Technology

N.Sudha1

, Suresh R.Norman2

1: N.Sudha, II yr M.E. (Applied Electronics), S S N College of Engineering, Chennai

2: Asst. Professor, SSN College of Engineering, Chennai SSN College of Engineering (Affiliated by: Anna University) Email id: [email protected] , Phone no: 9994309837

Abstract- The embedded remote electronic measuring system board, with interface modules uses an embedded board to replace a computer. The embedded board has the advantages of being portable, operates in real time, is low cost and also programmable with on board operating system.

The design provides step by step function to help the user operate, such as keying in the waveform parameters with the embedded board keyboard, providing test waveforms and then connecting the circuit-under-test to the embedded electronic measurement system. This design can also display the output response waveform measured by the embedded measurement system (from the circuit-under-test) on the embedded board LCM (Liquid Crystal Monitor).

I. INTRODUCTION

Initially designed remote electronic measurement system used interface chips to design and implement the interface cards for the functions of the traditional electronic instruments of a laboratory such as power supply, a signal generator and an oscilloscope. They integrate the communication software to control the communication between the computer and interface card by means of an ISA (Industry Standard Architecture) bus to transmit or receive the waveform data. By utilizing widely used software and hardware, the users at the client site will convey the waveform data through the computer network and can not only receive the input waveforms from the clients, but can also upload the measurement waveforms to a server. In this remote electronic measurement system has some disadvantages: it requires operation with a computer and it is not portable.

The intended embedded electronic measurement system overcomes this disadvantage by replacing the computer with the embedded board, since it has an advantage of portability, has real time operation, is low cost, and is programmable.

In this design the users need only key in the waveform and voltage parameters using the

embedded keyboard. Then the embedded remote measurement system will output the voltage waveform into circuit under test. The users can then observe the waveforms by means of the oscilloscope interface of the embedded board. If the users are not satisfied with this waveform, they can re-setup another waveform. The dual channel LCM (Liquid Crystal Monitor) provides a comparison between the input and response waveforms to determine the characteristics of the circuit under test. The network function can also transfer the measured waveforms to a distant server. Our design can provide different applications, in which any electronic factory can build the design house and the product lines at different locations. The designer can measure the electronic products through the Internet and the necessary interfaces. If there are any mistakes in the circuit boards on the production line, our design the engineers can solve these problems through network communications; which will reduce the time and cost.

II. HARDWARE OF THE EMBEDDED REMOTE

ELECTRONIC MEASUREMENT SYSTEM The embedded remote electronic measurement

system includes power supply, a signal generator and an oscilloscope. Fig. 1 shows the embedded remote measurement system architecture.

Fig 1. The embedded remote electronic measurement system architecture

Fig. 2 shows the hardware interface modules of the embedded remote electronic measurement system. We can divide these interface modules into three parts, ADC, DAC and the control modules. The function of the ADC module is for the oscilloscope that is used mainly to convert the analog signal to the digital format for the measurement waveform. The function of the DAC module is to convert the digital signal to analog signal

VCCC‘08

40

to convert the analog signal to the digital format for the measurement waveform. The function of the DAC module is to convert the digital signal to analog signal for outputting, such as power supply and signal generator. The control signals manage the ADC and DAC modules to connect and transfer the measurement waveform data to the embedded board during each board during each time interval to avoid snatching resources from each other.

Fig 2. The interface module of the embedded remote electronic measurement system A).DAC modules provide the major functions of the power supply and signal generator. The power supply provides stable DC voltage but the signal generator provides all kinds of waveform, like sine, square and triangular waveform.

a. The power supply provides an adjustable DC voltage with varying small drift and noises in the output voltage. The embedded board keyboard establishes the voltage values and sends the data to the power supply module. The system can make a maximum current value up to 2A.

b. The signal generator provides the specific waveforms which are based on sampling theory. According to the sampling theory, if the conversion rate is 80 MHz, the clear waveform is composed of at least 10 samples, so it can only provide the maximum waveform bandwidth of 8MHz.

Fig 3.The signal generator of the embedded remote electronic measurement system

In our design, by using a keypad the users can enter all settings of waveform, amplitude and frequency into an embedded board and the embedded board will output the waveform into the testing circuit. The users can preview the waveforms in LCM. If users are not satisfied with this waveform, they can re-setup another waveform. Fig 4. shows the signal generation flow chart. B) The ADC module provides the key function of the oscilloscope, which converts the analog signal to digital format for the embedded board.

Fig 4. Flow chart of the embedded signal generator

According to the sampling theory, the sample rate should be more than twice that of the signal bandwidth, but in our design the sample rate is 10 times to signal bandwidth, then only we can get the better quality of waveforms.

If a waveform is only composed of two or three samples, the triangular waveform is same as the sine waveform, and the users cannot recognize what kinds of waveform it is, so the sample rate must be more than ten times larger than the signal bandwidth in order to recognize waveforms.

Fig 5. The procedure of analog to digital conversion.

The embedded oscilloscope provides the view of the measurement waveform for the users and transfers the waveform data to the server in the distance for verification. Because the resolution of the embedded board LCM is lower than the resolution of computer monitor; user can only observe a low quality of waveform in the LCM in the client site. If desired to observe high quality of waveform, user can transfer the measurement waveform to a computer system. C) The control module provides the control between the memory and the data storage, and I/O connection. Because the I/O pins are limited, not every extra module can connect to an independent I/O simultaneously. Some of the I/O pins need to be shared or multiplexed. As the embedded board cannot recognize which module is connected to it and cannot allocate the system resources to an extra module, we need to use a control module to manage every extra module. A control module includes three control chips, which has three I/O ports and a bidirectional data bus which very convenient for the input and the output.

VCCC‘08

41

Chapt

er

7

IV. CONCLUSION Embedded design takes very little space and is

easily portable. The electronic instrument operation is unified as our design uses less system resources, increases the operating efficiency and has more functions together with an extra I/O interface. This design can replace the traditional signal generator and oscilloscope and integrate the Measurement Testing system into one portable system. It is also possible for electronic engineers to remotely measure circuits under test through the network transmission of the measurement waveform.

REFERENCES

[1] M. J. Callaghan, J. Harkin, T. M. McGinnity and L. P. Maguire, “An Internet-based methodology for remotely accessed embedded systems,” Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Volume 6, pp. 6-12, Oct. 2002 [2] Jihong Lee and Baeseung Seo, “Real-time remote monitoring system based on personal communication service (PCS),” Proceedings of the 40th SICE Annual Conference, pp. 370-375, July 2001. [3] Ying-Wen Bai, Cheng-Yu Hsu, Yu-Nien Yang, Chih-Tao Lu, Hsiang-Ting Huang and Chien-Yung Cheng, “Design and Implementation of a Remote Electronic Measurement System,” 2005 IEEE Instrumentation and Measurement Technology Conference, pp.1617-1622, May 2005. h [4] T. Yoshino, J. Munemori, T. Yuizono, Y. Nagasawa, S. Ito, K.Yunokuchi, Parallel Processing, “Development and application of a distance learning support system using personal computers via the Internet Yoshino,” Proceedings of 1999 International Conference on Parallel Processing, pp. 395-402, Sept. 1999 [5] Li Xue Ming, “Streaming technique and its application in distance learning system,” Proceedings of the 5th International Conference on Signal Processing, Volume 2, pp. 1329-1332, Aug. 2000. [6] J.L.Schmalzel, P.A.Patel, and H.N.Kothari, “Instrumentation curriculum: from ATE to VXI,” Proceedings of the 9th IEEE Instrumentation and Measurement Technology Conference, pp. 265-267, May 1992. [7] Ying-Wen Bai, Hong-Gi Wei, Chung-Yueh Lien and Hsin-Lung Tu, “A Windows-Based Dual-Channel Arbitrary Signal Generator,” Proceedings of the IEEE Instrumentation and Measurement Technology Conference, pp. 1425-1430 May 2002. [8] Digital Signal Processing; A Practical Approach, Emmanuel C. Ifeachor, and Barrie W. Jervis, Second Edition, 1996. [9] Ying-Wen Bai and Hong-Gi Wei, “Design and implementation of a wireless remote measurement system,” Proceedings of the 19th IEEE Instrumentation and Measurement Technology Conference, IMTC/2002. Volume 2, pp. 937-942, May 2002. [10] Ying-Wen Bai and Cheng-Yu Hsu, “Design and Implementation of an Embedded Remote Electronic Measurement System,” 2006 IEEE Instrumentation and Measurement Technology Conference, pp.1311-1316, April 2006

III. SOFTWARE DESIGN OF THE EMBEDDED REMOTE ELECTRONIC MEASUREMENT

SYSTEM The overall performance of an embedded

system is poor compared to that of a PC, but the embedded execute a specific application program which require less resources and more reliable than that of a PC. In addition, the embedded system can be designed by using C language and forms the GUI module for users operation. The advantages of the design it is easy to use and easy to debug the program errors. Fig.6 shows the flowchart of the embedded remote electronic measurement system. At first, the users can choose the type of the instruments, and then key in the relative parameters of the instrument. For example, one can key in the voltage values for the power supply, and key in waveform types for the generator module.When the system set up is finished, this system will output the DC voltages and the waveforms. In addition, if the users operate the oscilloscope, they only need to choose the sample rate and one can observe the measurement waveform in LCM. If the users want to send the waveform to the server, they just only need to key in the server IP address. When the transmission is connected, the embedded system will send the data to the server. Another advantage of the embedded remote electronic measurement system is the server can receive a lot of waveforms from the different client sites. If the observer has some questions to ask one of the clients, the observer can send the defined waveform to that embedded remote electronic measurement system and collect the output waveform again. This function can assist users to debug the distance circuits to both locate and understand the problem of the testing circuit in detail.

Fig 6. The Flow chart of the embedded remote electronic measurement system.

VCCC‘08

42

Abstract— Embedded System designs continue to increase in size, complexity, and cost. At the same time, aggressive competition makes today's electronics markets extremely sensitive to time-to-market pressures. A hardware prototype is a faithful representation of the final design, guarantying its real-time behavior. And it is also the basic tool to find deep bugs in the hardware. The most cost effective technique to achieve this level of performance is to create an FPGA-based prototype. As both of the FPGA and ARM Embedded system support the BST(Boundary Scan Test), we can detect faults, on the connections between the two devices by chaining their JTAG ports and performing BST. Since FPGA-based prototypes enable both chip and system-level testing, such prototypes are relatively inexpensive, thereby allowing them to be provided to multiple developers and also to be deployed to multiple development sites. Thus fast prototyping platform for ARM based embedded systems, providing a low-cost solution to meet the request of flexibility and testability in embedded system prototype development. Index Terms— FPGA, Rapid prototyping, Embedded system, ARM, Reconfigurable Interconnection.

I.INTRODUCTION

APID prototyping is a form of collecting information on requirements and on the adequacy of possible designs. Prototyping is very useful at

different stages of design such as product conceptualization at the task level and determining aspects of screen design. Embedded system designers, are under more and more pressure to reduce design time often in presence of continuously changing specifications. To meet these challenges, the implementation architecture is more and more based on programmable devices: micro-controllers, digital

signal processors and Field Programmable Gate-Arrays. The development tools used by system designers are often rather primitive: simulation models for FPGA devices, synthesis tools to map the logic into FFGAs, some high-level models and emulation systems for micro-controllers, software tools such as editors, debuggers and compilers. One of the major design validation problem facing an embedded system designer is the evaluation of different hardware-software partitioning. Reaching the integration point earlier in the design cycle not only finds any major problems earlier while there is still time to fix them, but also speeds software development. Most times, software integration and debugging could start much earlier, and software development proceed faster, if only a hardware (or ASIC) prototype could consistently be available earlier in the development cycle. A possible choice for design validation is to simulate the system being designed. However, this approach has several drawbacks. If a high-level model for the system is used, simulation is fast but may not be accurate enough. With a low-level model too much time may be required to achieve the desired level of confidence in the quality of the evaluation. Modeling a composite system that includes complex software programmable components is not easy due to synchronization issues.

In most embedded system applications, safety and the lack of a well-defined mathematical formulation of the goals of the design makes simulation inherently ill-suited for validation. For these reasons, design teams build prototypes which can be tested on the field to physically integrate all components of a system for functional verification of the product before production. Since design specification changes are common, it is important to maintain high flexibility during development of the prototype. In this paper we address the problem of hardware software partitioning evaluation via board-level rapid prototyping. We believe that providing efficient and flexible tools allowing the designer to quickly build a hardware software prototype of the embedded system will help the designer in this difficult evaluation task more effectively than having a relatively inflexible non-programmable prototype. Furthermore, we believe that coupling this board-level rapid-prototyping approach with synthesis tools for fast

The Design of a Rapid Prototype Platform for ARM Based Embedded System

A.Antony Judice1 Mr. Suresh R Norman2 A.Antony Judice 1, IInd M.E. (Applied Electronics), SSN College of Engineering, Kalavakkam,

Chennai-603110. Email: [email protected] 2 Mr.Suresh R Norman, Asst.Prof., SSN College of Engineering, Kalavakkam, Chennai-603110.

R

VCCC‘08

43

programming data for both the devices and the interconnections among them can make a difference in shortening the design time. The problems that we must solve include: fast and easy implementation of a prototype of the embedded system: validation of hardware and software communication (synchronization between hardware and software heavily impacts the performance of the final product). Our approach is based on the following basic ideas: 1.Use of a programmable board, a sort of universal printed circuit board providing re-programmable connections between components. With a programmable board as prototyping vehicle, the full potential the FPGA can be exploited FPGA Programming no longer affected by constraints such ns a fixed pin assignment due to the custom printed board or a wire wrap prototype. 2. Debugging the prototype by loading the code on the target emulator and making it run, programming the FPGA, providing signals to the board via pattern generator and analyzing the output signals via a logic analyzer. This approach can be significantly improved by using debugging tools for both software and hardware in order to execute step by step the software part and the clock-triggered hardware part. We argued that prototyping is essential to validate an embedded system. However, to take full advantage of the prototyping environment, it is quite useful to simulate the design as much as feasible at all levels of the hierarchy. Simulation is performed at different stages along the design flow. At the specification level we use an existing co-simulation environment for heterogeneous systems, which provides interfacing a well-developed set of design aids for digital signal processing. Prototyping can be used to gain a better understanding of the kind of product required in the early stages of system development where several different sketch designs can be presented to users and to members of the development team for critique. The prototype is thrown away in the end although it is an important resource during the products development of a working model. The prototype gives the designer a functional working model of their design so they can work with the design and identify some of its possible pros and cons before it is actually produced. The prototype also allows the user to be involved in testing design ideas Prototyping can resolve uncertainty about how well a design fits the user's needs. It helps designers to make decisions by obtaining information from users about the necessary functionality of the system, user help needs, a suitable sequence of operations, and the look of the interface. It is important that the proposed system have the necessary functionality for the tasks that users may want to perform anywhere from gathering information to task analysis. Information on the sequence of operations can tell the designers what users need to interact successfully with the system. Exchanges can

be fixed and supportive, but potentially constraining, or free and flexible In interactive systems design prototypes are critical in the early stages of design unlike other fields of engineering design where design decisions can initially be carried out analytically without relying on a prototype. In systems design, prototyping is used to create interactive systems design where the completeness and success of the user interface can utterly depend on the testing. Embedded systems are found everywhere. A specialized computer system that is part of a larger system or machine. Typically, an embedded system is housed on a single microprocessor board with the programs stored in ROM. Virtually all appliances that have a digital interface--watches, microwaves, VCRs, cars -- utilize embedded systems. Some embedded systems include an operating system , but many are so specialized that the entire logic can be implemented as a single program. In order to deliver correct-the-first-time products with complex system requirements and time-to-market pressure, design verification is vital in the embedded system design process. A possible choice for verification is to simulate the system being designed. Since debugging of real systems has to take into account the behavior of the target system as well as its environment, runtime information is extremely important. Therefore, static analysis with simulation methods is too slow and not sufficient. And simulation cannot reveal deep issues in real physical system. A hardware prototype is a faithful representation of the final design, guarantying its real-time behavior. And it is also the basic tool to find deep bugs in the hardware. For these reasons, it has become a crucial step in the whole design flow. Traditionally, a prototype is designed similarly to the target system with all the connections fixed on the PCB (printed circuit boards) As embedded systems are getting more complex, the needs for thorough testing become increasingly important. Advances in surface-mount packaging and multiple-layer PCB fabrication have resulted in smaller boards and more compact layout, making traditional test methods, e.g., external test probes and "bed-of-nails" test fixtures, harder to implement. As a result, acquiring signals on boards, which is beneficial to hardware testing and software development, becomes infeasible, and tracking bugs in prototype becomes increasingly difficult. Thus the prototype design has to take account of testability. If errors on the prototype are detected, such as misconnections of signals, it could be impossible to correct them on the multiple-layer PCB board with all the components mounted. All these would lead to another round of prototype fabrication, making development time extend and cost increase. Besides testability, it is important to maintain high flexibility during development of the prototype as design specification changes are common. Nowadays

VCCC‘08

44

complex systems are often not built from scratch but are assembled by re using previously designed modules or off-the-shelf components such as processors, memories or peripheral circuitry in order to cope with more aggressive time-to-market constraints. Following the top-down design methodology, lots of effort in the design process is spent on decomposing the customers’ requirements into proper functional modules and interfacing them to compose the target system. Some previous research works have suggested that FPLD (field programmable logic device) could be added to the final design to provide flexibility as FPLD’S can offer programmable interconnections among their pins and many more advantages. However, extra devices may increase production cost and power dissipation, weakening the market competition power of the target system. To address these problems, there are also suggestions that FPLD’S could be used in hardware prototype as an intermediate aproach Moreover, modules on the prototype cannot be reused directly. In industry, there have been companies that provide commercial solutions based on FPLD’S for rapid prototyping. Their products are aimed at SOC (system on a chip) functional verification instead of embedded system design and development. It also encourages concurrent development of different parts of system hardware as well as module reusing.

Fig.1 FPGA architecture

II. THE DESIGN OF A RAPID PROTOTYPING PLATFORM

A. Overview ARM based embedded processors are wildly used in embedded systems due to their low-cost, low-power consumption and high performance. An ARM based embedded processor is a highly integrated SOC including an ARM core with a variety of different system peripherals. Many arm based embedded

processors, adopt a similar architecture as the one shown in Fig. 1. The integrated memory controller provides an external memory bus interface supporting various memory chips and various operation modes (synchronous, asynchronous, burst modes). It is also possible to connect bus-extended peripheral chips to the memory bus. The on-chip peripherals may include interrupt controller, OS timer, UART, I2C, PWM, AC97, and etc. Some of these peripherals signals are multiplexed with general-purpose digital I/O pins to provide flexibility to user while other on-chip peripherals, e.g. USB host/client, may have dedicated peripheral signal pins. By connecting or extending these pins, user may use these on chip peripherals. When the on-chip peripherals cannot fulfill the requirement of the target system, extra peripheral chips have to be extended.

To enable rapid prototyping, the platform should be capable of quickly assembling parts of the system into a whole through flexible interconnection. Our basic idea is to insert a reconfigurable interconnection module composed by FPLD into the system to provide adjustable connections between signals, and to provide testability as well. To determine where to place this module, we first analyze the architecture of the system. The embedded system shown in Fig. 2 can be divided into two parts. One is the minimal system composed of the embedded processor and memory devices. The other is made up of peripheral devices extended directly from on-chip peripheral interfaces of the embedded processor, and specific peripheral chips and circuits extended by the bus. The minimal system is the core of the embedded system, determining its processing capacity. The embedded processors are now routinely available at clock speeds of up to 400MHz, and will climb still further. The speed of the bus connecting the processor and the memory chips is exceeding 100MHz. As pin-to-pin propagation delay of a FPLD is in the magnitude of a few nanoseconds, inserting such a device will greatly impair the system performance. The peripherals enable the embedded system to communicate and interactive with the circumstance in the real world. In general, peripheral circuits are highly modularized and independent to each other, and there are hardly needs for flexible connections between them.

Here we apply a reconfigurable interconnection module to substitute the connections between microcomputer and the peripherals, which enables flexible adjusting of connections to facilitate interfacing extended peripheral circuits and modules. As the speed of the data communication between the peripherals and the processor is much slower than that in the minimal system, the FPLD solution is feasible. Following this idea, we design the Rapid Prototyping Platform as shown in Fig. 2 We define the interface ICB between the platform and the embedded processor core board that holds the minimal system of the target embedded system. The interface IPB

VCCC‘08

45

between the platform and peripheral boards that hold extended peripheral circuits and modules is also defined. These enable us to develop different parts of the target embedded system concurrently and to compose them into a prototype rapidly, and encourage module reusing as well. The two interfaces are connected by a reconfigurable interconnect module. There are also some commonly used peripheral modules, e.g. RS232 transceiver module, bus extended Ethernet module, AC97 codec, PCMCIA/Compact Flash Card slot, and etc, on the platform which can be interfaced through the reconfigurable interconnect module to expedite the embedded system development.

FIGURE 2: Schematic of the Rapid Prototyping Platform

B. Reconfigurable Interconnect Module With the facility of state-of-arts FPLD’S, we design interconnection module to interconnect, monitor and test the bus and I/O signals between the minimal system and peripherals. As the bus accessing obeys specific protocol and has control signals to identify the data direction, the interconnection of the bus can be easily realized by designing A corresponding bus transceiver into the FPLD, whereas the Interconnection of the I/Os is more complex. As I/Os are Multiplexed with on-chip peripherals signals, there may be I/Os with bi-direction signals, e.g. the signals for on-chip I2C interface, or signals for on-chip MMC (Multi Media Card interface. The data direction on these pins may alter without an external indication, making it difficult to connect them via a FPLD. One possible solution is to design a complex state machine according to corresponding accessing protocol to control the data transfer direction. In our design we assign specific locations on the ICB and IPB interfaces to these bi-direction signals and use some jumpers to directly connect these signals when needed. The problem is circumvented at the expense of losing some flexibility.

Fig 3:FPGA design flow

The use of FPLD to build the interconnection module not only offers low cost and simple architecture for fast prototyping, but also provides many more advantages. First, Interconnections can be changed dynamically through internal Logic modification and pin re-assignment to the FPLD. Second, as FPLD is connected with most pins from the embedded processor, it is feasible to detect interconnection problems due to design or physical fabricate fault in the minimal system with BST (Boundary-Scan Test, IEEE Standard 1149.1 specification). Third, it is possible to route the FPLD internal signals and data to the FPLD’S I/O pins for quick and easy access without affecting the whole system design and performance. It is even possible to implement an embedded logical analyzer into the FPLD to smooth the progress of the hardware verification and software development. Before the advent of programmable logic, custom logic circuits were built at the board level using standard components, or at the gate level in expensive application-specific (custom) integrated circuits. The FPGA is an integrated circuit that contains many (64 to over 10,000) identical logic cells that can be viewed as standard components. Each logic cell can independently take on any one of a limited set of personalities. The individual cells are interconnected by a matrix of wires and programmable switches. A user's design is implemented by specifying the simple logic function for each cell and selectively closing the switches in the interconnect matrix. The array of logic cells and interconnects form a fabric of basic building blocks for logic circuits. Complex designs are created by combining these basic blocks to create the desired circuit. Field Programmable means that the FPGA function is defined by a user's program rather than by the manufacturer of the device. A typical integrated circuit performs a particular function defined at the time of manufacture. In contrast, the FPGA function is defined by a program written by someone other than the device manufacturer. Depending on the

VCCC‘08

46

particular device, the program is either 'burned' in permanently or semi-permanently as part of a board assembly process, or is loaded from an external memory each time the device is powered up. This user programmability gives the user access to complex integrated designs without the high engineering costs associated with application specific integrated circuits. C. Design Flow Changes Allowed by re programmability This combination of moderate density, re programmability and powerful prototyping tools provides a novel capability for systems designers: hardware that can be designed with a software-like iterative-implementation methodology. Figure 4 Shows a typical ASIC design methodology in which the design is verified by simulation at each stage of refinement. Accurate simulators are slow; fast simulators trade away simulation accuracy. ASIC designers use a battery of simulators across the speed accuracy spectrum in an attempt to verify the design. Although this design flow works with FPGA’S as well, an FPGA designer can replace simulation with in-circuit verification, “simulating” the circuitry in real time with a prototype. The path from design to prototype is short, allowing a designer to verify operation over a wide range of conditions at high speed and high accuracy. This fast design-place-route-load loop is similar to the software edit-compile-run loop and provides the same benefits. Designs can be verified by trial rather than reduction to first principles or by mental execution. A designer can verify that the design works in the real system, not merely in a potentially-erroneous simulation model of the system. This makes it possible to build proof-of-concept prototype designs easily. Design-by-prototype does not verify proper operation with worst-case timing merely that the design works on the presumed-typical prototype part. To verify worst-case timing, designers may check speed margins in actual voltage and temperature corners with a scope, speeding up marginal signals; they may use a software timing analyzer or simulator after debugging to verify worst-case paths; or simply use faster speed-grade parts in production to ensure sufficient speed margin over the complete temperature and voltage range.

Fig 4: Contrasting design methodologies: (a) Traditional gate arrays; and (b) FPGA Prototype versus Production As with software development, the dividing line between prototyping and production can be blurred with a reprogrammable FPGA. A working prototype may qualify as a production part if it meets cost and performance goals. Rather than re-design, an engineer may choose to substitute a faster speed FPGA using the same programming bit stream, or a smaller, cheaper compatible FPGA with more manual work to squeeze the design into a smaller IC. A third solution is to substitute a mask-programmed version of the LCA for the field-programmed version. All three of these options are much simpler than a system re design .Rapid prototyping is most effective when it becomes rapid product development.

VCCC‘08

47

Field Upgrades Re programmability allows a systems designer another option: that of modifying the design in the FPGA by changing the programming bit stream after the design is in the customer’s hands. The bit stream can be stored PROM or elsewhere in the system. For example, an FPGA used as a peripheral on a computer may be loaded from the computer’s disk. In some existing systems, manufacturers send modified hardware to customers as anew bit stream on a floppy disk or as a file sent over a modem. Reprogram ability for Board-Level Test The most common system use of re programmability is for board-level test circuitry. Since FPGA’S are commonly used for logic integration, they naturally have connections to major subsystems and chips on the board. This puts the FPGA in an ideal location to provide system-level test access to major subsystems. The board designer makes one configuration of the FPGA for normal operation and a separate configuration for test mode. The “operating” logic and the “test” logic need not operate simultaneously, so they can share the same FPGA. The test logic is simply a different configuration of the FPGA, so it requires no additional on-board logic or wiring. The test configuration can be shipped with the board, so the test mode can also be invoked as a diagnostic after delivery without requiring external logic

III.EXPERIMENTAL RESULTS

As the Rapid Prototyping Platform is still under development, we present an example applied with the same considerations in the Rapid Prototyping Platform. It is an embedded system prototype based on Intel XScale PXA255,which is an ARM based embedded processor. The diagram of the prototype is illustrated in Fig. 5. where a Bluetooth module is connected to the prototype USB port and a CF LAN card is inserted. The FPGA (an Altera Cyclone EP1C6F256) here offers the same function as the reconfigurable interconnection module shown in Fig. 2. Most of the peripheral devices are expanded to the system through the FPGA, and more peripherals can be easily interfaced when needed. As both of the FPGA and PXA255 support the BST, we can detect faults, e.g. short circuit and open-circuit faults, on the connections between the two devices by chaining their JTAG ports and performing BST. Here, we use an open source software package to perform the BST. The FPGA internal signals can be routed to the debugging LED matrix for easy access, which is helpful in some simple Testing and debugging. We also insert an embedded logical analyzer, the Signal Tap II embedded logic

analyzer provided in Altera’s Quartus II software, into the FPGA for handling more complicated situations. Quartus II software enables the highest levels of productivity and the fastest path to design completion for both high-density and low-cost FPGA design. Dramatically improve your productivity compared to traditional FPGA design flows. Take advantage of the following productivity enhancing features today. With the help of the logical analyzer, we are able to capture and monitor data passing through over a period of time, which expedites the debugging process for the prototype system.

FIGURE 5:Hardware Prototyping by making use of

FPGA Boundary scan test (IEEE 1149.1) This incorporated earlier boundary scan tests that had been developed for testing printed circuit boards. Boundary scan allows the engineer to verify that the connections on a board are functioning. • The JTAG standard uses an internal shift

register which is built into JTAG compliant devices.

• This, boundary scan register, allows monitoring and control of all I/O pins, signals and registers

• The interface to JTAG is through a standard PC parallel port to a JTAG port on the board to be tested.

VCCC‘08

48

PC

IR

SR

SP

o/p ports

i/p ports

TAPJTAG

Control

De

vic

e i

d

ins

tru

cti

on

TDO data out0 1 0 1

TRST

TMS

TCK

TDI data in

0 1 0 1

clock

mode

reset

Boundary scanregister

• The operation of the BSR is controlled by

the TAP(Test Access Port) state machine • This decodes the instructions sent through

the BSR and decides what actions to take. • The actions mainly involve loading the BSR

cells with data, executing the instructions and then examining the results.

• The TAP is controlled by the state of TMS • As long as the addresses, data and signal

states are correct, the developer can do tasks such as query the flash chips, erase them, and load data in

IV. CONCLUSIONS AND FUTURE WORK

In this paper we have shown that the use of a flexible prototype based on re-programmable devices and coupled with a set of synthesis tools that provide fast programming data can shorten the design time. The prototype we obtain is neither the result of a software simulation nor the result of hardware emulation, because it is made up of a hardware circuitry and a software implementation. So, the components used in the prototype are close to those used in the final product. Moreover, it is possible to evaluate the tradeoff between hardware partition and software partition. Last but not least, the technology used allows real-time operation mode. This approach can help to fix errors early in the design cycle since the hardware-software integration test is done earlier than the common approaches with custom PCB. The next step will be to test the prototype in a real environment. In this paper, we discuss the design of a fast prototyping platform for ARM based embedded systems to accommodate the requirements of flexibility and testability in the prototyping phase of an embedded system development.

REFERENCES

[1] S. Trimberger, “A Reprogrammable Gate Array and Applications,”Proc. IEEE, Vol. 81, No. 7, July 1993, pp. 1030-1041. [2] Hauck, S, "The roles of FPGA’S in reprogrammable systems", Proc.IEEE , Vol. 86 , Issue: 4 , April 1998, pp. 615 – 638 [3] Cardelli, S.; Chiodo, M.; Giusto, P.; Jurecska, A.; Lavagno, L.;Sangiovanni-Vincentelli, A.“Rapid-Prototyping of Embedded Systems via Reprogrammable Devices,” Rapid System Prototyping, 1996.Proceedings., Seventh IEEE International Workshop on. [4] Product Briefs for System Explorer Reconfigurable Platform with debugging capabilities, http://www.aptix.com/literature/productbriefs/sysExpl.pdf [5] Steve Furber, Stephen B., ARM system-on-chip architecture, Addison-Wesley, 2000. [6] Intel® PXA255 Processor Developer's Manual, http://www.intel.com/design/pca/applicationsprocessors/manuals/278693.htm [7 JTAG Tools, http://openwince.sourceforge.net/jtag/

NCVCCC-‘08

49

Implementation of High Throughput and Low Power FIR Filter in FPGA

V.Dyana Christilda B.E*, R.Solomon Roach M.Tech**

* Student, Department of Electronics and Communication Engineering **Lecturer, Department of Electronics and Communication Engineering

Francis Xavier Engineering College,Tirunelveli. Email:[email protected]

Abstract-This paper presents the implementation of high throughput and low power FIR filtering IP cores. Multiple datapaths are utilized for high throughput and low power is achieved through coefficient segmentation, block processing and combined segmentation and block processing algorithms.Also coefficient reduction algorithm is proposed for modifying values and the number of non-zero coefficients used to represent the FIR digital pulse shaping filter response. With this algorithm, the FIR filter frequency and phase response can be represented with a minimum number of non-zero coefficients. Therefore, reducing the arithmetic complexity needed to get the filter output. Consequently, the system characteristic i.e. power consumption, area usage, and processing time are also reduced The paper presents the complete architectural implementation of these algorithms for high performance applications. Finally this FIR filter is designed and implemented in FPGA.

1.INTRODUCTION

One of the fastest growing areas in the computing industry is the provision of high throughput DSP systems in a portable form. With the advent of SoC technology, Due to the intensive use of FIR filters in video and communication systems, high performance in speed, area and power consumption is demanded. Basically, digital filters are used to modify the characteristic of signals in time and frequency domain and have been recognized as primary digital signal processing operations For high performance low power applications, there is a continuous demand for DSP cores, which provide high throughput while minimizing power consumption. Recently, more and more traditional applications and functionalities have been targeted to palm-sized devices, such as Pocket PCs and camera-enabled mobile phones with colorful screen. Consequently, not only is there a demand of provision of high data processing capability for multimedia and communication purposes, but also the requirement of power efficiency has been increased significantly.

Furthermore, power dissipation is becoming a crucial factor in the realization of parallel mode FIR filters. There is increasing number of published

techniques to reduce power consumption of FIR filters. The authors in [l] utilize differential coefficients method (DCM) which involves using various orders of differences between coefficients along with stored intermediate results rather than using the coefficients themselves directly for computing the partial products in the FIR equation.

To minimize the overhead while retaining the benefit of DCM, differential coefficient and input method (DCIM) [2] and decorrelating (DECOR) [3] have been proposed. Another approach used in [4] is to optimize word-lengths of input/output data samples and coefficient values. This involves using a general search based methodology, which is based on statistical precision analysis and the incorporation of cost/performance/power measures into an objective function through word-length parameterization. In [5], Mehendale et al. presents an algorithm for optimizing the coefficients of an FIR filter. So as to reduce power consumption in its implementation on a programmable digital signal processor.

This paper presents the implementation of high throughput and low power FIR filtering Intellectual Property (IP) cores. This paper shows their implementation for increased throughput as well as low power applications, through employing multiple datapaths. The paper studies the impact of parameterization in terms of datapaths parallelization on the power /speed/ area performance of these algorithms.

II.GENERAL BACKGROUND

Finite Impulse Response filters have been used in signal processing as ghost cancellation and channel equalization . FIR filtering of which the output is described in Equation 1 is realized by a large number of adders, multipliers and delay elements.

Where Y[n] is the filter output, X[n k]is input data, and h[k]is the filter coefficient. Direct form of a finite word length FIR filter generally begins with rounding or truncating the optimum infinite precision

NCVCCC-‘08

50

coefficients determined by McClellan and Parks algorithm

III.LOW POWER GENERIC FIR CORE

The block diagram of a generic DF FIR filter implementation is illustrated in Figure 1.It consists of two memory blocks for storing coefficients (HROM) and input data samples (XRAM), two registers for holding the coefficient (HREG) and input data (XREG). an output register (OREG), and the controller along with the datapath unit. The XRAM is realized in the form of a latch-based circular buffer for reducing its power consumption. The controller is responsible for applying the appropriate coefficients and data samples to the datapath.

Fig-2 low power generic FIR core

In order to increase the throughput, the number of datapaths should be increased and data samples and coefficients should be allocated to these datapaths in each clock cycle. For example, for a 4-tap FIR filter with 2 datapaths, the coefficient data can be separated in to 2 parts, (h3,h2)and (h1,h0) each allocated to a different datapath with corresponding set of input data samples, as shown in Figure 2. Therefore, an output will be obtained in [N/M] clock cycles, where N is the number of taps and M is the number of datapaths. For example, for a 4-tap filter, an output can be obtained in 2 clock cycles with 2 datapaths.

Fig. 3 A 2 datapath architecture

IV.DESIGN AND METHODOLOGY A. Coefficient Segmentation Algorithm

In DSP applications due to the ease of performing arithmetic operations. Nevertheless, sign extension is its major drawback and causes more switch activity when data toggles between positive and negative values. For this reason, in Coefficient segmentation algorithm, the coefficient h is segmented into two parts; one part, mk for the multiplier and one part, sk for the shifter. Segmentation is performed such that mk is the smallest positive value in order to minimize the switching activity at the multiplier input. On the other hand skis a power of two number and could be both positive and negative depending on the original coefficient.

The MSB bit of sk acts as the sign bit and remainder are the measure of shift. For instance, if a coefficient is 11110001, the decomposed number and shift value will be 00000001 and 10100, respectively. An example of 2 datapath implementation architecture of this algorithm is shown in Figure 4.

Fig.4 AU of segmentation algorithm

The AU for the coefficient segmentation algorithm is shown in Fig. 4. It consists of a multiplier (mult), an adder (add), a logarithmic shifter (shift) implemented using arrays of 2-10-1 multiplexers, a conditional two's complementor (xconv), a multiplexer (mux) to load and clear the slufter and a clearing block (clacc) identical to the one in the conventional FIR filtering block. The MSB of the shift value sk determines if a negative shift has to be performed and therefore controls the conversion unit xconv.

FSM

DMU

PMU

HROM XRAM

HREG XREG

ARITHMETIC UNIT

OREG

NCVCCC-‘08

51

Fig.5 Flow chart of algorithm

The output of xconv is the two's complement of the data only if the MSB of sk is one, otherwise the output is equal to the input data. When hk is zero (mk=0, sk = 0) or one (mk= 1, sk=0), the shift value will be zero. In these cases, the output of the shifter must be zero as well. To guarantee this behavior, a multiplexer is needed between the conversion unit and the shifter that applies a zero vector when sk equals zero. Since three values (multiplier, shifter and accumulator outputs) are to be added, a single multi input adder carries out this addition. B. Data Block-Processing

The main objective of block processing is to implement signal processing schemes with high inherent parallelism . A number of researchers have studied block processing methods for the development of computationally efficient high order recursive filters, which are less sensitive to roundoff error and coefficient accuracy During filtering, data samples in fixed size blocks, L, are processed consecutively. This procedure reduces power consumption by decreasing the switching activity, by a factor depending on L, in the following: (1) coefficient input of the multiplier, (2) data and coefficient memory buses, (3) data and coefficient address buses.Due to the successive change of both coefficient and data samples at each clock cycle, there is a high switching activity within the multiplier unit of the datapath. This high switching activity can be reduced significantly, if the coefficient input of the multiplier is kept unchanged and multiplied with a block of data samples.

Once a block of data samples are processed, then a new coefficient is obtained and multiplied with a new block of data samples. However, this process requires a set of accumulator registers corresponding to the size of the data block size. The previous results have shown that a block size of 2 provides the best

results in terms of power saving. An example datapath allocation for N=6 and L=2 and its corresponding architecture is shown in Figure 5 . The sequence of steps for the multiplication scheme can be summarized as follows: 1. Get the first filter coefficient, h(N-1). 2. Get data samples x[n-(N-1)], x[n-(N-2)], . . .,x[n-(N-L)] and save them into data registers Ro, R1,. . . , RL-1 respectively 3. Multiply h(N-1) by R0, R1, ..., RL-1 and add the products into the accumulators ACC0, ACC1, ..., ACCL.1 respectively. 4. Get the second coefficient, h(N-2). 5. Get the next data sample, x[n-(N-L-l)], and place it in Ro overwriting the oldest data sample in the block. 6. Process h(N-2) as in step (3), however, use registers in a circular manner, e.g. multiply h(N-2) by R1, . . ., RL-I,R0 Their respective products will be added to accumulators ACC0, ACC1, . . . ,.ACCL-I. Process the remaining coefficients as for h(N-2). 7. Get the output block, y(n), y(n-I), ..., y(n-L), from ACCo, ACCl, ..., .ACCL.-1 respectively. 8. Increment n by L and repeat steps (1) to (7) to obtain next output block.

Fig.5.The block processing algorithm with 2 data paths.

C. Combination Coefficient Segmentation and Block Processing Algorithm

The architectures of coefficient segmentation and block processing algorithms can he merged together. This will reduce the switching activity at both coefficient and data inputs of the multiplier units within the data paths with only slight overhead in area. The algorithm commences by processing the coefficient set through the segmentation algorithm. The algorithm segments the coefficients into two primitive parts.

The first part Sk ,is processed through a shifter and the remaining part mk is applied to the

NCVCCC-‘08

52

multiplier input. The algorithm performs the segmentation through selecting a value of sk which leaves mk to be smallest positive number. This results in a significant reduction in the amount of switched capacitance. The resulting sk and mk values are then stored in the memory for filtering operations. The filtering operation commences by fetching sk and mk values and applying these to both shifter and multiplier inputs respectively. Next, a block of L data samples(x0,x1,…xL-1) are fetched from the data memory and stored in the register file.

This is followed by applying the first data sample x0,in the register file to both shifter and multiplier units. The resulting values from both shifter and multiplier units are then summed together and the final result is added to the first accumulator. The process is repeated for all other data samples. The contents of the register file are updated with the addition of a single new data entry which will replace the first entry in the previous cycle.

This procedure reduces the switching activity at coefficient inputs of the multiplier, since the same coefficient is used in all data samples in the block. In addition less memory access to both data and coefficient memories are required since coefficient and data samples are obtained through internal registers.

Fig.6 Combined segmentation and block processing algorithm with 2 data paths. The sequence of steps involved are given below, 1.Clear all accumulators (ACC1 to ACCL-1) 2.Get the multiplier part, m(N-1) , of the coefficient h(N-1) from the Coefficient Memory and apply it to the coefficient input of multiplier control 3.Get the shifter part, s(N-1) of the coefficient h(N-1) and apply it to inputs of the Shifter 4.Get the data samples x[n-(N-1)], x[n-(N-2)],…. x[n-(N-L)], and store these into data

Registers R0,R1,….RL-1 respectively. This will form the first block of data samples. 5.Apply R0 to both the multiplier and shifter units .Add their results and the content of accumulator Acc0 together and store the final result into accumulator ACC0.Repeat this for the remaining data registers R1 to RL-1, this time using accumulators ACC1 to ACCL-1

respectively. 6.Get the multiplier part ,m(N-2),and the shifter part s(N-2) of the next coefficient ,h(N-2) and apply these to the multiplier and shifter inputs respectively. 7.Update the data block formed in step (4) by getting the next data sample, x[n-(N-L-1)] and storing it in data register R0 overwriting the oldest data sample in the block. 8.Process the new data block as in step(5) However, start processing with R1, followed by R2…RL-1,R0 in a circular manner. During this procedure use accumulators in the same order as data registers 9.Process the remaining multiplier and shifter parts as in steps(6) to(8).

10.Get the first block of filter outputs y(n),y(n-1)….y(n-L)from ACC0,ACC1..ACCL-1

11.Increment n by L and repeat steps(1) to (10) to obtain the next block of filter outputs.

D. Coefficient Reduction Algorithm for Coefficient Representation The main goal of the coefficient reduction algorithm is to reduce the number of non-zero coefficients used to represent the filter response. The coefficient reduction algorithm is summarized below: 1. Derive the filter coefficients based on the desired specifications using the MATLAB or any other filter design software program. 2. Multiply these coefficients with a constant value so that you get some of them greater than zero. 3. Round up the values obtained from step 2 to be integers. 4. The number of non zero values obtained from step 3 must represent at least 93% of the signal power. 5. If the same signal power representation can be obtained with different constant values then the smaller value is chosen. 6. The values of the first and last coefficient produced from step 5 are equal to zero. 7. Change the values of the first and last coefficient to be non-zeros with their original sign in step 1. 8. Find the frequency response of the filter using the new set of coefficients and see whether it fulfills the desired specifications or not. 9. The absolute value of the first coefficient must be less than 10. Values greater than the proper one will cause either ripples in the pass band and/or in

NCVCCC-‘08

53

transition band and/or maximize the gain factor of the filter response. 10. If ripples are there in the pass band or the transition band region of the frequency response found in 8, then the first and last coefficient values must be reduced. 11. Divide the new set of coefficients by the constant used in step 2 so that the filter response is normalized back to zero magnitude.

Fig. 7 Distribution of Transmitted Signal’s average power

The coefficient reduction algorithm starts

with obtaining the filter coefficients based on desired specifications. Using the round function in MATLAB these coefficients are rounded to the nearest integer after being multiplied with a constant integer value. It is better if we choose the constant value to be a power of 2 i.e. (2m) so that the division in step 11 is done simply by a right shift.

The frequency domain representation obtained with the new set of coefficients must cover at least 93% of the signal power (see Fig. 7) otherwise; the filter performance with its new set of coefficients will differ much from its original one. The value of the constant must be the smaller if more than one constant can produce the same signal power. Smaller value will lead to less gain factor and less passband and/or transition band ripples.

V.CONCLUSION

This paper gives the complete architectural implementations of low power algorithms for high performance applications. Combining the two algorithms the power is reduced and the throughput is increased by increasing and the number of datapath units. Combined Segmentation and Block processing (COMB) algorithm achieves best power savings.

This paper also presents in detail an algorithm proposed for modifying the number and values of FIR filters coefficients that is coefficient reduction algorithm.The algorithm target is to reduce the number of non-zero coefficients used to represent any FIR. An encouraging results will be achieved when obtaining the phase and frequency responses of the filters with the new set of coefficients The schemes target reducing power consumption through a reduction in the amount of switched capacitance within the multiplier section of the DSPs.

REFERENCES [1]N. Sathya .K Roy. and D. Bhatracharya: "Algorithms for Low Power and High Speed FIR Film Realisation Using Differential Coefficients", IEEE Trans. on Circuits and Systems-II Analog and Digital Signal &messing. vol. 44. pp. 488-497, June. 1997. [2] T-S Chang. Y-H Chu. aod C-W Jen: "Low Power FIR Filter Realization with Differential Coefficients and lnputs". I EE E Trans on Circuits and Systems-II: Analog and Digital Signal Processing. vol.47. no. 2, pp. 137-148.Feb..2000. [3]M.Mehendale,S.D.Sherlekar,G.Venkatesh:”Low Power Realization of Fir filters on programmable DSPs”IEEE Trans. On VlSI Systems [4]. Ramprasad N.R. Shanbhag. and I.H. Hajj: "Decorrelating (DECORR) Transformatians for Low Power Digital filters". IEEE Trans on Circuits and Systems-II. Analog and Digital Signal Processing. [5]. Choi and W.P. Burleson: "Search - Based Wordlength optimistion for VlSI/DSP synthesis". VLSI signal Processing [6T. Erdogan, M. Hasan T. Arslan. "Algorithm Low Power FIR Cores" Circuits, Devices and Systems, IEEE Proceedings.

[7] Kyung-Saeng Kim and Kwyro Lee, "Low- power and area efficient FIR filter implementation suitable for multiple tape," IEEE Trans. On VLSI systems, vol. 11, No. 1, Feb.

2003. [8] Tang Zhangwen, Zahang Zahnpeng, Zhang Jie, and Min Hao, “A High-Speed, Programmable, CSD Coefficient FIR filter”, ICASSP

NCVCCC-‘08

54

n x Scalable stacked MOSFET for Low voltage CMOS technologies

1T.Loganayagi, Lecturer Dept of ECE (PG), Sona College of technology, Salem 2M.Jeyaprakash student Dept of ECE (PG) Sona College of technology, Salem

Abstract This paper presents a design and implementation of stacked MOSFET circuit for output drivers of low voltage CMOS technologies. A monolithic implementation of series connected MOSFETs for high voltage switching is presented. Using a single low voltage control signal to trigger the bottom MOSFET in the series stack, a voltage division across parasitic capacitances in the circuit is used to turn on the entire stack of devices. Voltage division provides both static and dynamic voltage balancing, preventing any device in the circuit from exceeding its nominal operating voltage. This circuit, termed the stacked MOSFET, is n x scalable, allowing for on-die control of voltages that are n x the fabrication processes rated operating voltages. The governing equations for this circuit are derived and reliable operation is demonstrated through simulation and experimental implementation in a 180 nm SOI CMOS process. Key WordsCMOS integrated circuits, high voltage techniques, Buffer circuits, input/output (I/O).

I.INTRODUCTION

High-voltage switching in current MOSFET technology is becoming increasingly difficult due to the decreasing gate-oxide thickness. Devices with reduced gate-oxide are Optimized for speed, power consumption and size of the device. Stacked MOSFETs in combination with level shifters are one circuit technique to switch high-voltages and overcome the decreased gate-oxide break down. The Stacked MOSFETs enables rail-to-rail high voltage switching. On-Die high-voltage switching (where high-voltage is defined as any voltage greater than the rated operating voltage of the CMOS fabrication process being used) is a system-on-chip (SOC) design challenge that is becoming ever more problematic. Such difficulty is a direct result of the reduced breakdown voltages that have arisen from the deep sub-micrometer and nanometer scaling of MOSFET geometries. While these low-voltage processes are optimized for minimum power consumption, high speed, and maximum integration density, they may not meet the requirements of system applications where high-voltage capabilities are needed. Such applications of on-die high-voltage switching include MEMS device control, monolithic power converter switching, high-voltage aperture

control, ultrasonic transducer control, electro-static device control, piezoelectric positioning, and many others. Existing methods for handling such high-voltage switching can be divided into two general categories: device techniques and circuit techniques [1]. Device techniques include lateral double-diffused MOSFETs (LDMOSFETs) and mixed voltage fabrication processes. These methods increase the individual transistor’s breakdown voltage by modifying the device layout. In the past LDMOSFETs have been used to achieve extremely high operating voltages in standard CMOS technologies [2]. This is accomplished by adding a lightly-doped drift region between the drain and gate channel. The layout of such devices is unique to each process they are implemented in, and as such, are very labor and cost intensive. Further, because modern fabrication processes utilizing thinner gate oxides and reduced overall process geometries, new LDMOSFETs are becoming less effective. Even circular LDMOSFETs, the most effective device shape for mitigating high e-field stress, are less effective than they once were. Mixed voltage fabrication processes essentially take a step back in time, allowing for the fabrication of large geometry, thick oxide devices on the same substrate as sub-micrometer geometry devices [3]. Although effective, these processes are more expensive due to their added mask and process steps, and still exhibit an upper limit on operating voltage. Further, because more die-space per transistor is required, the performance per area is relatively poor. Circuit techniques used for on-die high-voltage control include level shifters and monolithic high-voltage input/output (I/O) drivers. Taking many different forms, level-shifters work by upwardly translating low-voltage signals, such that the voltage across any two terminals of each device in the circuit never exceeds the rated operating voltage [1],[4]. In doing this, an output voltage that is greater than the individual transistor breakdown voltages can be controlled. However, the magnitude of the output voltage swing is still limited by the individual transistor breakdown. This requires an output signal that does not operate rail-to-rail. As such, these level-shifters are only suitable for applications where the addition of off-die high-voltage transistors is possible. Monolithic high-voltage I/O drivers are a relatively new technique for the on-die switching of voltages greater than the rated operating voltage of the process [6]. These circuits enable high-voltage switching using only the native low-voltage FETs of the

NCVCCC-‘08

55

fabrication process. Reference [5] reports a circuit that switches 2(1/2) x the rated voltage of the process. While this topology is theoretically n x scalable, it requires an excessive number of devices in the signal path, not only taking up a large amount of die area, but also increasing the on-resistance. Ref. [6] reports achieving 3x the rated voltage of the process using only three devices in the signal path. This minimizes the on-resistance, but the design is not n x scalable. In this paper, we present a circuit technique for on-die high-voltage switching that uses a minimum number of devices in the signal path while remaining n x scalable. Termed the stacked MOSFET, this circuit uses the native low-voltage FETs of the fabrication process to switch voltages n x grater than the rated breakdown voltage of the process used. That is, this circuit is scalable to arbitrarily high output voltages, limited only by the substrate breakdown voltage. The goal of this paper is to show that the stacked MOSFET is scalable in integrated circuits [7]. The topology is not changed from [7], but the governing equations are rederived here such that the design variables are those that are commonly available in any IC process ([7] derived the governing equations based on design variables commonly available for discrete high voltage power MOSFETs). First, an overview of the Stacked MOSFET topology is presented, along with a derivation of its governing equations. This discussion focuses on the specific realization of a two-MOSFET stack, with a generalization given for extending to an -MOSFET stack. Second, circuit simulation results are presented, giving validity to our mathematical model. Third, and finally, experimental results are presented, revealing excellent correlation between the analytic models, simulation results, and measured results.

II.DERIVATION OF CHARACTRISTICS EQUATIONS

Fig. 1 shows the topology of a two-device Stacked MOSFET. By placing MOSFETs in series and equally dividing the desired high-voltage across them, for the entire switching period, reliable high-voltage control can be achieved. In switching applications this circuit will act as a single MOSFET switch, being controlled by a low-voltage logic level. Hess and Baker implemented this circuit using discrete power MOSFETs. As such, their characterization of the circuit was well suited to the discrete design process, utilizing spec sheet parameters such as MOSFET input and output capacitance. To realize this circuit concept in IC technology the governing equations need to be recharacterized for IC design parameters. The following is a derivation of the governing equations for the two-device Stacked MOSFET based on conservation of charge principles. From this

derivation, equations representing an -device Stacked MOSFET will be generated.

Fig. 1. Schematic of two stacked MOSFET. A. Derivation for a Two-Device stacked MOSFET

Fig. 2. Two-device Stacked MOSFET, including parasitic capacitances, with definition of notation used in derivation. The triggering of the Stacked MOSFET is accomplished through capacitive voltage division. As shown in Fig. 2, there exists an inherent parasitic capacitance Cp between the gate and source of M2. This capacitance, along with a capacitor C2 inserted in the gate leg of M2 will set the value of Vgs2 that turns on M2. Consider the circuit in Fig. 2 with an initial condition of both devices turned off. If the resisters are sized such that biasR << R1+ R2 (1)

NCVCCC-‘08

56

Then the output voltage rise to ddV . (Note that this assumes that the off state leakage current through M1 and M2 is much less than the current through R1 and R2.) Since M1 is off, the node drainV is free to take on the value dictated by the voltage divider of R1 and R2. If R1 and R2 are sized equally then

2

dddrain

VV =

(2) This voltage is grater than 2gV (the reason for this will become more apparent later in the derivation), and causes the diode to be forward biased. The resulting voltage at the gate of M2 will be

diodedd

diodedraing VVVVV −=−=2

2

(3) where diodeV is the forward voltage across the diode.

Equation (2) and (3) dictate a 2gV of 2gV = - Vdiode (4) keeping M2 off. As such, the off condition, with the output voltage at Vdd and Vdrain at Vdd/2 exhibits static voltage balancing and results in this condition being safely held. When inV rises high, M1 is turned on, pulling

drainV to ground. This reverse biases the diode, leaving the gate-source voltage of M2 to be set by the capacitive voltage divider of C2 and Cp. Cp represents the lumped total parasitic capacitance across the gate-source of M2 and can be solved for as

v2)-ds(1v1)-gd(1gbgsdiodep EEC CCCCC ++++=

(5) Where Cdiode is the reverse bias junction capacitance of the diode and Cgs, Cgb, Cgd, and Cds are the corresponding MOSFET junction capacitances. Ev1 and Ev2 are used to approximate the Miller capacitance resulting from Cgd and Cds, respectively, and are defined as

diodegs

dd

gs

dsv VV

v

VVE

+−=

∆

∆= 2

1

(6a)

diodegs

diodegsdd

gs

dgv VV

VVv

VVE

+

++−=

∆

∆= 2

2

(6b) At turn-on C2 and Cp are in parallel, resulting in the final gate-source voltage being dictated by the total

charge on the parallel combination of the two capacitors. By the conservation of charge, the total charge on the parallel combination of C2 and Cp will be the sum of their initial charges Qtotal= Q2 (initial) + Qp (initial) (7) Where Q2(initial)=C2(Vdd/2-Vdiode) Qp(initial)=Cp(-Vdiode). (8) The resulting gate-source voltage will be

parallel

finalgsQ

V total)(2

Q=

(9) Substituting in (8)

( ) ( )

CpCvvvCV

2

diodediode-dd/22gs2 +

−+=

pC

(10) This simplifies to

( )diodep2

2diode

2

dd

p2

2gs2 CC

CVVCC

CV V−+⎟⎟⎠

⎞⎜⎜⎝

⎛−= ++

(11) Solving (11) for C2, an expression for setting the desired gate-source voltage to turn on M2 can be found as

( )⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜

⎝

⎛

+−

+=

diodegs2

dddiodegs

p2Vv

VCCV

V

(12) M2 will then be held on as long as the charge on C2 maintains a voltage greater than the threshold of M2. This implies an inherent low-frequency limitation, due to on-state leakage current dissipating the charge on C2. If frequencies of operation less than those allowed by the given value of C2 are desired, C2 and Cp can be simultaneously scaled up according to the ratio

( )⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜

⎝

⎛

+−

+=

diodegsdd

diodegs

p

2

VV2

vVV

C

C

(13) Because MOSFETs are majority charge carrier devices and each device in this circuit is capacitively coupled to the next, it is expected that all devices in the stack will turn on and turn off together, maintaining a dynamic voltage balancing. This will be experimentally verified in the following.

NCVCCC-‘08

57

B. Derivation for an n-Device Stacked MOSFET The previous analysis and characterization can be directly extended to a stack of nMOSFETs. Fig. 3 shows a generalized schematic of an n-device Stacked MOSFET. Equation (11) can be redefined for the generalized circuit as

⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟

⎠⎞

⎜⎝⎛⋅−= + diode

n

dd)1(

p(i)(i)

(i)gs(i) vvCV iCC

( )diodep(i)(i)

p(i)V

CC

C−

++

(14) Where n is the number of devices in the stack and is the specific device being considered. The parasitic capacitances Cp (i) are defined in the same manor as for the two-device stack and are all equal for equally sized devices in the stack. The design equation for setting the turn-on gate-source voltages is then generally defined as follows:

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

+⋅

+=

diodegs(-(Vdd/n)1)-(i

diodegsp(i)(i)

VVCC V

V

(15) The (i-1) (Vdd/n) term in the denominator of (15) will increase for devices higher in stack, and result in a value for C(i) that is less than C(i-1). This reduction in C(i) implies that the ratio of die-space occupied to output voltage decreases for higher voltages. In other words, less overhead space is required for devices at the top of the stack than at the bottom. As with the two-device Stacked MOSFET, if frequencies of operation less than those allowed by the given value of C(i) are desired, C(i) and Cp(i) can be simultaneously scaled up according to the ratios as follows:

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

+⋅

+=

diodegs(-(Vdd/n)1)-(i

diodegs

p(i)

(i)

VV

CC

VV

(16)

Fig. 3. Generalized n-device Stacked MOSFET. III. DESIGN AND SIMULATION Utilizing the previous equations, a two-device Stacked MOSFET has been designed and simulated for implementation in Honeywell’s 0.35- m PD SOI CMOS process. This process is rated for 5-V operation. The models used are the BSIMSOI models provided by Honeywell. Also, to illustrate the validity of the design equations for the general –device Stacked MOSFET, simulation results for an eight-device Stacked MOSFET are included.

TWO-DEVICE STACKED MOSFET Consider the two-device Stacked MOSFET, shown in Fig. 1. If each FET used in the stack is sized to have a W/L of 500 and a gate-source voltage of 4V, then the parasitic capacitances, under the desired operating conditions, can be extracted from the device models as shown in Table I. This table also includes the extracted diode junction capacitance at the appropriate biasing conditions. Accordingly, can be sized using (5) and (12) to be 14.6 pF. The simulated drain voltages resulting from the previous design values are shown in Fig. 4. The top trace is the drain voltage for M2 and the lower trace is the drain voltage for M1. Note that the voltages are evenly distributed causing neither device to exceed its drain-source breakdown voltage. The gate-source voltage

NCVCCC-‘08

58

which controls M2 is shown in Fig. 5. Note that the 4-V gate source voltage design for turning on M2 is achieved. Also, the predicted 0.7-V gate-source voltage used to hold the stack off is exhibited.

Fig.4. Drain voltages for two-device Stacked MOSFET operating with a 10v supply. TABLE I MODELED JUNCTION CAPACITANCES

Capacitance Extracted values Gate-Source Gate-Bulk Gate-Drain Drain-Source Diode

838.63 16.62 52.03 10.87 9.96

Fig.5. Gate-Source voltage for M2 in a Two-device Stacked MOSFET operating with a 10-v supply.

Fig.6. Dynamic drain voltage balancing on the rising edge.

Fig.7. Dynamic drain voltage balancing on the falling edge.

IV. EXPERIMENTAL RESULTS The previously simulated two-device Stacked MOSFET has been implemented in the same Honeywell 0.35- m PD SOI CMOS process. The layout and test structure is shown in Fig. 9. In implementing this circuit it is important to take into account any parasitics that are introduced in layout as well as in the test and measurement setup. All capacitances will affect the operation of the Stacked MOSFET. For this reason, good layout techniques, coupled with post-layout parasitic simulation of the circuit, are critical. Further, realistic models of capacitances and inductances introduced by probe tips, bond wires, or other connections should be considered. Fig. 8 shows a drain voltage characteristic similar to the simulation results shown in Fig. 4. This characteristic results from the two-device Stacked MOSFET being biased with a 10-V supply, operating at 50 kHz. As predicted, these measurements show that in the off state static voltage balancing is achieved. This balancing ensures that each device is supporting an even 5-V share of the 10-V output. When the stack turns on, both devices turn on almost simultaneously, pulling the output to ground. As discussed previously, because the MOSFET is a majority charge carrier device, and each device is capacitively coupled to the next, all of the devices in the stack rise and fall together. This dynamic voltage sharing is what allows for each component in the circuit to operate very near the edge of its rating.

NCVCCC-‘08

59

Fig.8. Measured drain voltages for a two device stacked MOSFET showing even voltage sharing for both devices.

Fig.9. Layout of a two-device stacked MOSFET with test pads. V. CONCLUSION In this paper we have shown that with new characteristic equations, that the series connected MOSFET circuit is adaptable to IC technology. Using this monolithic implementation of series connected MOSFETs, on-die high voltage switching is achieved. The governing design equations have been derived and verified through circuit simulation and experimental measurement. This technique for on-die high-voltage switching can be classified as a circuit technique that reliably achieves rail-to-rail output swings. Such high-voltage switching is accomplished using only the fabrication processes native logic gates. The triggering of this circuit is extremely fast, exhibiting input-to-output delays of only 5.5 ns, with rise and fall times of approximately 10 kV/ s. The low frequency limit is set only by the scaling of the inserted gate capacitor. The high frequency limit will ultimately be set by the rise/fall times. Our measured results show excellent static and dynamic voltage sharing. In the event of transient over voltages, the over voltage is evenly distributed across the stack, minimizing the impact.

REFERENCES [1] H. Ballan and M. Declercq, High Voltage Devices and Circuits in Standard CMOS Technologies. Norwell, MA: Kluwer, 1999. [2] T. Yamaguchi and S. Morimoto, “Process and device design of a 1000-V MOS IC,” IEEE Trans. Electron Devices, vol. 29, no. 8, pp. 1171–1178, Aug. 1982. [3] J.Williams, “Mixing 3-V and 5V ICS,” IEEE Spectrum, vol. 30, no. 3, pp. 40–42, Mar. 1993. [4] D. Pan, H. W. Li, and B. M. Wilamowski, “A low voltage to high voltage level shifter circuit for MEMS application,” in Proc. UGIM Symp., 2003, pp. 128–131. [5] A.-J. Annema, G. Geelen, and P. de Jong, “5.5 V I/O in a 2.5 V in a 0.25 _m CMOS technology,” IEEE J. Solid-State Circuits, vol. 36, no. 3, pp. 528–538, Mar. 2001. [6] B. Serneels, T. Piessens, M. Steyaert, and W. Dehaene, “Ahigh-voltage output driver in a 2.5-V 0.25-_m CMOS technology,” IEEE J. Solid- State Circuits, vol. 40, no. 3, pp. 576–583, Mar. 2005. [7] H. Hess and R. J. Baker, “Transformerless capacitive coupling of gate signals for series operation of powerMOSdevices,” IEEE Trans. Power Electron., vol. 15, no. 5, pp. 923–930, Sep. 2000.

NCVCCC-‘08

60

Test pattern selection algorithms using output deviation

S.Malliga Devi, Student Member, IEEE, Lyla.B.Das, and S.Krishna kumar

Abstract:-It is well known that n-detection test sets are effective to detect unmodeled defects and improve the defect coverage. However, in these sets, each of the n-detection test patterns has the same importance on the overall test set performance. In other words, the test pattern that detects a fault for the first time plays the same important role as the test pattern that detects that fault for the (n)- th time. But the test data volume of an n-detection test set is often too high .In this paper, we use output deviation algorithms combined with n-detection test set to reduce test data volume and test application time efficiently for test selection using Probabilistic fault model and the theory of output deviation method. To demonstrate the quality of the selected patterns , we present experimental results for non feedback zero-resistance bridging faults, stuck open faults in the ISCAS benchmark circuits. Our results show that for the same test length, patterns selected on the basis of output deviations are more effective than patterns selected using several other methods.

I.INTRODUCTION

Semiconductor manufacturers strive to attain

a high yield (ideally 100%) when fabricating integrated circuits. Unfortunately, numerous factors can lead to a variety of manufacturing defects which may reduce the overall yield. The purpose of testing is to identify and eliminate any effective chips after the chips are manufactured. However, it is currently impractical to test exhaustively for all possible defects. This is a result of the computational infeasibility of accurately modeling defects, limitations imposed by existing manufacturing test equipment and time/economic constraints imposed by the test engineers. For these reasons, the stuck-at-fault (SAF) model has been accepted as the standard model to generate test patterns[2]. Most of the existing commercial ATPG tools use the SAF coverage as a metric of the quality of a test set and terminate test generation when a high SAF fault coverage is attained.

Each possible physical defect in a tested circuit should be covered by the test method that leads to the lowest overall testing costs, taking into account

e.g. complexity of the test pattern generation (TPG) and the test application time. The problem of finding an optimal test set for a tested circuit with acceptable fault coverage is an important task in diagnostics of complex digital circuits and systems. It has been published that high stuck-at-fault (SAF) coverage cannot guarantee high quality of testing, especially for CMOS integrated circuits. The SAF model ignores the actual behaviour of digital circuits implemented as CMOS integrated circuits and does not adequately represent the majority of real integrated circuits defects and failures.

The purpose of fault diagnosis is to determine the cause of failure in a manufactured, faulty chip. An n-detection test set has a property that each modeled fault is detected either by n different tests, or by the maximum obtainable different m tests that can detect the fault (m < n). Here, by different tests for a fault, we mean tests which can detect this fault and activate or/and propagate the faulty effect along different paths[3]. The existing literature reports experimental results [4] suggesting that the n-detection test sets are useful in achieving high defect coverage for all types of circuits (combinational, scan sequential, and non-scan sequential). However, the effectiveness of n-detection tests for diagnosis remains an unaddressed issue. The inherent limitation for n-detection tests is their increased pattern size. Typically, the size of an n-detection test set increases approximately linearly with n [3]. Because the tester storage space is limited, large test volume may create problems for storing the failing-pattern responses.

In this paper, we investigate the effectiveness of n-detection tests to diagnose failure responses caused by stuck-at and bridging faults. It was observed in [] that the common one-detection test set with greater than 95% stuck-at fault coverage produced only 33% coverage of node-to-node bridging faults. A test that detects a stuck-at fault on a node will also detect the corresponding low resistive bridges (AND,OR) with the supply lines. This is also the reason that the tests generated for stuck-at faults can detect some bridging defects in the circuit. However, such test sets do not guarantee the detection of node-to-node bridges. If a stuck-at fault on a node is detected once, the probability of detecting a static bridging fault with another un-correlated node that has signal probability of 50% is also 50%. When the stuck-at fault is detected twice (thrice), the estimated probability of detecting the bridging fault with another node acting as an aggressor will increase to 75%

NCVCCC-‘08

61

(88%). A test set created by a conventional ATPG tool aiming at single detection may have up to 6% of stuck-at faults detected only once, and up to 10% of stuck-at faults detected only once or twice. This may result in inadequate coverage of node-to-node bridging defects. The experimental results show that in general, n-detection tests can effectively improve the diagnostic algorithm’s ability to locate the real fault locations even though use the single-stuck-at-fault based diagnosis algorithm.

A. Fault model

we consider a probabilistic fault model[5] that allows any number of gates in the IC to fail probabilistically. Tests for this fault model, determined using the theory of output deviations, can be used to supplement tests for classical fault models, thereby increasing test quality and reducing the probability test escape. By targeting multiple fault sites in a probabilistic manner, such a model is useful for addressing phenomenon or mechanisms that are not fully understood. Output deviations can also be used for test selection, whereby the most effective test patterns can be selected from large test sets during time-constrained and high-volume production testing[1].The key idea here is to use redundancy to ensure correct circuit outputs if every logic gate is assumed to fail with probability . Elegant theoretical results have been derived on the amount of redundancy required for a given upper bound on . However, these results are of limited practical value because the redundancy is often excessive, the results target only special classes of circuits, and a fault model that assigns the same failure probability to every gate (and for every input combination) is too restrictive. Stochastic techniques have also been proposed to compute reliably using logic gates that fail probabilistically.

II.FINDING OUT ERROR PROBABILITY In this section, we explain how to calculate

the error probability of a logic gate. Error probability of a gate defines the probability of the output being an unexpected value for the corresponding input combination.For calculating the error probability we need ,reliability vector of the gate, and the probability of the various input combinations.

Using the method used in[7],we calculate the output probability of the above gate. with the input combination as 00,the output probabilities pc0=0.1,pc1=0.9.pc0 is the probability of output being 0 and pc1 is the probability of output being 1 for the corresponding input combination.

III. CALCULATION OF THE OUTPUT DEVIATION

Output deviation is the metric which tells how much the output deviates from the expected value.

We use ISCAS-85 benchmark circuits for calculation of the output deviation methods.

For example to calculate the output deviation

[5] of the circuit c17.bench from ISCAS-85 benchmark circuits, for the input pattern 00000 the expected outputs are 0 0.but the probability of output line 22 being 1 is 0.333 and output being 0 is 0.667.similarly the probability of output line 23 being 1 is also 0.333 and output being 0 is 0.667.

From the definition of output deviation[7] , for the above circuit the output deviation of the output lines 22, 23 is .333,0.333 respectively .

IV.N-DETECTION TEST SET

In n-detection test, where each target fault is targeted by n different test patterns. Using the n-detection test defect coverage is increased. As the number of unique detections for each fault increases, the defect coverage usually improves. An advantage of this approach is that even when n is very large, n-detection test sets can be generated using existing single stuck-at ATPG tools with reasonable computation time. We use ATALANTA single stuck –at ATPG tool to generate n-detection test set .

A. Disadvantage of n-detection test

However, the data volume of an n-detection test set is often too large, resulting in long testing time and high tester memory requirement . This is because the n-detection method simply tries to detect each single stuck-at fault n times, and does not use any other metric to evaluate the contribution of a test pattern towards increasing the defect coverage. It has been reported in the literature that the sizes of n-detection test sets tend to grow linearly with n .

B. Importance of test selection

Therefore, test selection is necessary to ensure that the most effective test patterns are chosen from large test sets during time constrained and high-volume production testing. If highly effective test

NCVCCC-‘08

62

patterns are applied first, a defective chip can fail earlier, further reducing test application time environment. Moreover, test compression is not effective if the patterns are delivered without a significant fraction of don't-care bits. In such cases, test set selection can be a practical method to reduce test time and test data volume.

In this paper, we use the output deviation metric for test selection. To evaluate the quality of the selected test patterns, we determine the coverage that they single non-feedback zero-resistance bridging faults (s-NFBFs),and stuck open faults. Experimental results show that patterns selected using the probabilistic fault model and output deviations provide higher fault coverage than patterns selected using other methods.

V. ALGORITHMS

Using theory of output deviation method test pattern selection algorithm is done. In that selection of small number of test patterns T11 from a large test set called T1. To generate T1, we run ATALANTA a single stuck- at fault ATPG tool. The ATPG tool generates n-detection test patterns for each single stuck at fault. Each time a test pattern is selected, we perform fault simulation and drop those faults that are already detected n times. The set T1 is randomly reordered before being provided as input to Procedure1.The flow chart for the procedure is shown in fig.3.

Then we sort T1 such that test patterns with high deviations can be selected earlier than test patterns with low deviations. For each primary output (PO), all test patterns in T1 are sorted in descending order based on their output deviations ,we get test set T2.

Using the sorted test is applied to procedure1, therefore we get the optimized n-detection test set that normally contains smaller number of test patterns and achieves high defect coverage.

In procedure2 selection of test patterns with low output deviations are selected earlier than test patterns with high output deviations. But this procedure takes one more parameter called threshold[7].

VI.EXPERIMENTAL RESULTS

The work is in progress .All experiments are being performed on a Pentium 4 PC running Linux with a 2.6 Ghz processor and 1G memory. The program to compute output deviations is to be implemented using C. Atalanta and its associated simulation engine are used to generate n-detection test sets. We have written the fault simulation program in c language so that we can add constrains in the

simulation in future. We are also implementing a bridging fault simulator to calculate the coverage of single non-feedback, zero-resistance bridging faults (sNFBFs). To eliminate any bias in the comparison of different methods for test set selection, we use two arbitrarily chosen sets of confidence level vectors for our experiments.One more method to evaluate the test patterns selected using output deviation method is Gate exhaustive (GE) testing metrics [8]which are computed using an inhouse simulation tool based on the fault simulation program FSIM [9].A CMOS combinational circuit under the presence of a SOP fault behaves like a sequential circuit[10]. In CMOS circuits, the traditional line stuck-at fault model does not represent the behaviors of stuck-open (SOP) faults properly. A sequence of two test patterns is required to detect a SOP fault. SOPRANO[11] an efficient automatic test pattern generator for stuck-open faults in cmos combinational circuits. We also apply the output deviation algorithms to the stuck open faults to evaluate the quality of selected test patterns in a high volume production testing environment. We are currently concentrating to get the tools to evaluate the stuck open faults.

VII.CONCLUSION Evaluation of pattern grading using the fault coverage for stuck open faults and non feedback bridging faults is being done to demonstrate the effectiveness of output deviation as a metric to model the quality of test patterns. This proves especially useful in high volume and time constraint production testing environment . The work is on progress and final results will be exhibited during the time of presentation.

REFERENCES [I] M.Abramovici, M.A.Breuer, A.D.Friedman, “Digital Systems Testing and Testable Design”,1990, Computer Science Press,pp.94-95. [2]. Y. Tian, M. Mercer, W. Shi, and M. Grimaila, “An optimal test pattern selection method to improve the defect coverage” in Proc. ITC, 2005. [3]. “Multiple Fault Diagnosis Using n-Detection Tests”, Zhiyuan Wang, Malgorzata Marek-Sadowska1 Kun-Han Tsai2 Janusz Rajski2, Proceedings of the 21st International Conference on Computer Design (ICCD’03) [4]. E.J. McCluskey, C.-W. Tseng,“Stuck-fault tests vs. actual defects”, Proc. of Int'l Test Conference, 2000, pp. 336 -342. [5] Z. Wang, K. Chakrabarty, and M. Goessel,”Test set enrichment using a probabilistic fault model and the theory of output deviations”in Proc.DATE Conf., 2006, pp. 1275.1280. [6] K. P. Parker and E. J. McCluskey,”.Probablistic treatment of general combinational networks”, IEEE Trans. Computers, vol. C-24, pp. 668.670, Jun. 1975.

NCVCCC-‘08

63

[7] “An Efficient Test Pattern Selection Method for Improving Defect Coverage with Reduced Test Data Volume and Test Application Time”,Zhanglei Wang and Krishnendu Chakrabarty 15th Asian Test Symposium (ATS'06) [8] K. Y. Cho, S. Mitra, and E. J. McCluskey,”Gate exhaustive testing”, In Proc. ITC, 2005, pp. 771.777. [9] “An efficient, forward fault simulation algorithm based on the parallel pattern single fault propagation” in Proc. ITC, 1991, pp. 946-955. [10] ”A CMOS Stuck-Open Fault Simulator”,Hyung K. Lee, Dong S. Ha, ieee proceedings - 1989 Southeastcon. [II] “SOPRANO: An Efficient Automatic Test Pattern Generator For Stuck-Open Faults In Cmos Combinational Circuits”,Hyung Ki Lee and Dong Sam Ha, 27th ACMllEEE Design Auto mation Conference.

NCVCCC-‘08

6464

B.Mohan*, R. Sundararajan *, J.Ramesh** and Dr.K.Gunavathi*** UG Student

** Senior Lecturer *** Professor

Department of ECE PSG College of Technology, Coimbatore

Abstract: In today’s world Digital to Analog converters are used in the wide range of applications like wireless networking (WLAN, voice/data communication and Bluetooth), wired communication (WAN and LAN), and consumer electronics (DVD, MP3, digital cameras, video games, and so on). Therefore the DAC unit must be fault free, and there is a need for a system to detect the fault occurrence. This paper deals with designing an efficient system to detect and classify the fault in the DAC unit. R-2R DAC has been used for analysis and the back propagation neural network algorithms are used in classifying the faults. Efficiency of 77% is achieved in classifying the fault by implementing three back propagation neural network algorithms.

I. INTRODUCTION

There are many challenges for mixed signal design to be adaptable for SOC implementation. The major considerations in designing these mixed signal circuits for the complete SOC are high speed, low power, and low voltage. Both cost and high speed operation are limitations of the complete SOC. Accordingly, to remove the speed gap between a processor and circuits in the complete SOC implementation, architectures must not only be fast but also cheap. The next challenge is low power consumption. In the portable device market, reducing the power consumption is one of the main issues. Low voltage operation is one of the difficult challenges in the mixed-signal ICs. Above all the circuits designed must be fault free. If any fault occurs then it must be detected. Therefore the fault classification is one of the major needs in mixed signal ICs. This paper aims at implementing efficient fault classification in DAC unit using neural network.

R-2R D/A Converter work under the principle of voltage division and this configuration consist of a network of resistors alternating in value of R and 2R.

Fig 1 illustrates an 8-bit R-2R ladder. Starting at the right end of the network, notice that the resistance looking to the right of any node to ground is 2R. The digital input determines whether each resistor is switched to ground (non inverting input) or to the inverting input of the op-amp. Each node voltage is related to VREF, by a binary-weighted relationship caused by the voltage division of the ladder network. The total current flowing from VREF is constant, since the potential at the bottom of each switched resistor is always zero volts (either ground or virtual ground). Therefore, the node voltages will remain constant for any value of the digital input.

Fig 1 Schematic of R-2R digital to analog

converter

D0D1D2D3D4D5D6D7

Vref

R R R R R R R R R

R

2R 2R 2R 2R 2R 2R 2R 2R 2R

out

The output voltage, out, is dependent on currents flowing through the feedback resistor, RF(=R), such that

out = -iTOT . RF (1) Where iTOT is the sum of the currents selected by the digital input by

1

0 2 *2

NREF

tot k N kk

Vi DR

−

−=

= ∑ (2)

Where Dk is the k-th bit of the input word with a value that is either a 1 or a 0.The voltage scaling DAC structure is very regular and thus well suited for MOS technology. An advantage of this architecture is that it guarantees monotonicity, for the voltage at each tap cannot be less than the tap below. The area required for the voltage scaling DAC is large if the number of bits is eight or more. Also, the conversion speed of the converter will be sensitive to parasitic capacitance at each of its internal nodes.

Fault Classification Using Back Propagation Neural Network for Digital to Analog Converter

II. R-2R DAC

NCVCCC-‘08

6565

FIG 2 OUTPUT RESPONSE OF 8-BIT R-2R DAC

The output of R-2R DAC for the 8-bit pattern counter input is shown below in Fig 2. The output is very linear, glitch free and rises to the supply voltage of 2.5 V within 256 µs. The INL and DNL curves for the fault free case are plotted using MATLAB and are shown in Fig 3. The maximum INL and DNL are found to be 0.038LSB and -0.012LSB respectively.

FIG 3.1 INL CURVES OF R-2R DAC

FIG 3.2 DNL CURVES OF R-2R DAC

The offset error, gain error and power consumed by the R-2R DAC are shown in table 1. TABLE 1 PERFORMANCE METRICS OF R-2R

TABLE 1 PERFORMANCE METRICS OF R-2R

DAC

Offset error 0.002484 LSB

Gain error 0.00979 LSB

Average power 34.7106 µW

Max power 827.2653 µW

Min power 0.000127 µW

III. FAULT MODELS The Structural fault models considered for the testing of the DACs are:

1) Gate-to-Source Short (GSS) 2) Gate-to-Drain Short (GDS) 3) Drain-to-Source Short (DSS) 4) Resistor Short (RS) 5) Capacitance Short (CS) 6) Gate Open (GO) 7) Drain Open (DO) 8) Source Open (SO) 9) Resistor Open (RO) The structural faults are illustrated in fig 4. The low resistance (1) and high resistance (10M) are frequently used to simulate structural faults. Restated, a transistor short is modeled using low resistance (1) between the shorted terminals, and an open transistor is modeled as a large resistance (10M) in series with the open terminals. For example, the gate-open is modeled by connecting the gate and the source, and also the gate and the drain of the transistor by a large resistance (10M).

Fig 4 Structural faults

Gate to Drain

Short

Drain to Source

Short

1ohm

1ohmG

Gate to Source Short Resistor Short

NCVCCC-‘08

6666

1ohm

1ohmR

resistor open Capacitance short

R 10Mohm

1ohmC

Gate Open Drain Open Source Open

10Mohm

10Mohm

10Mohm

G

S

10Mohm

G

D

IV. MONTE CARLO ANALYSIS

All types of faults are introduced in each transistor and resistor and Monte Carlo simulation is done for each case. The Monte Carlo analysis in T-Spice is used to perform simulation by varying the value of the threshold voltage(parameter).The iteration value of the Monte Carlo analysis specifies the number of times the file should be run by varying the threshold value. Syntax in T-Spice to invoke the Monte Carlo Analysis: .param VTHO_N=unif(0.3694291,.05,2) VTHO_P=unif(0.3944719,.05,2) (3) The result thus obtained is stored in the spread sheet for further fault classification using neural network

V. FAULT CLASSIFICATION USING BACK PROPAGATION NEURAL NETWORK

Any function from input to output can be implemented as a three-layer neural network. In order to train a neural network to perform some task, weights and bias value must be adjusted at each iteration of each unit in such a way that the error between the desired output and the actual output is reduced. This process requires that the neural network compute the error derivative of the weights (EW). In other words, it must calculate how the error changes as each weight is increased or decreased slightly. The back propagation algorithm is the most widely used method for determining the EW. The goal now is to set the interconnection weights based on the training

patterns and the desired outputs. In a three-layer network, it is a straightforward matter to understand how the output, and thus the error, depends on the hidden-to-output layer weights. The results obtained from the Monte Carlo simulation are used to detect and classify the fault using the neural network model. Here back propagation neural network model is used. Following Back propagation Algorithms are used to classify the faults: Trainbfg Traincgp Trainoss

(A)TRAINBFG Trainbfg can train any network as long as its weight, net input, and transfer functions have derivative functions. Back propagation is used to calculate derivatives of performance with respect to the weight and bias variables X. Each variable is adjusted according to the following: X = X + a*dX; (4) where dX is the search direction. The parameter a is selected to minimize the performance along the search direction. The first search direction is the negative of the gradient of performance. In succeeding iterations the search direction is computed according to the following formula: dX = -H\gX; (5) where gX is the gradient and H is an approximate Hessian matrix.

(B)TRAINCGB Traincgb can train any network as long as its weight, net input, and transfer functions have derivative functions. Back propagation is used to calculate derivatives of performance with respect to the weight and bias variables X. Each variable is adjusted according to the following: X = X + a*dX; (6) where dX is the search direction. The parameter a is selected to minimize the performance along the search direction. The first search direction is the negative of the gradient of performance. In succeeding iterations the search direction is computed from the new gradient and the previous search direction according to the formula: dX = -gX + dX_old*Z; (7) where gX is the gradient. The parameter Z can be computed in several different ways.

(C)TRAINOSS Trainoss can train any network as long as its weight, net input, and transfer functions have derivative functions. Back propagation is used to calculate derivatives of performance with respect to the weight

NCVCCC-‘08

6767

and bias variables X. Each variable is adjusted according to the following: X = X + a*dX; (8) where dX is the search direction. The parameter a is selected to minimize the performance along the search direction. The first search direction is the negative of the gradient of performance. In succeeding iterations the search direction is computed from the new gradient and the previous steps and gradients according to the following formula: dX = -gX + Ac*X_step + Bc*dgX; (9) where gX is the gradient, X_step is the change in the weights on the previous iteration, and dgX is the change in the gradient from the last iteration. While using the neural network algorithms the following parameters are varied within the range specified and the results are obtained.

TABLE 2 RANGE OF PARAMETER VARIATION

Range of Parameter Variation

Learning Rate 0.01 to 0.05

Hidden Layer Neurons 10 to 15 Neurons

Epochs for Training 100 to 1500 Epochs

VI. OUTPUT RESULTS The following are the output results for the different Back Propagation algorithms by varying the parameter values like learning rate, epochs and the hidden layers for the maximum fault detection capability.

Fig5: performance graph for trainbfg algorithm with

parameters values learning rate=0.03, hidden layer = 8

Fig6: performance graph for traincgp algorithm with parameters values learning rate=0.03, hidden layer =8

Fig7: performance graph for trainoss algorithm with

parameters values learning rate=0.03, hidden layer = 8

Fig8 : performance graph for trainbfg algorithm with parameters values learning rate=0.01, epochs= 1000

NCVCCC-‘08

6868

Fig9 : performance graph for traincgp algorithm with parameters values learning rate=0.01, epochs= 1000

Fig10 : performance graph for trainoss algorithm with parameters values learning rate=0.01, epochs= 1000

Fig11 : performance graph for trainbfg algorithm with

parameters values no of hidden layers= 8, epochs= 1000

Fig12 : performance graph for trainbfg algorithm with


Fig13 : performance graph for trainoss algorithm with


From fig 5,6,7 it can be inferred that for constant learning rate of 0.03 and hidden layer of value 8,the fault coverage is best for trainoss algorithm with epoch value of 1000 . From fig 8,9,10 it can be inferred that for constant learning rate of 0.01 and epoch value of 1000,the fault coverage is best for trainoss algorithm with hidden layer of value 8. From fig 11,12,13 it can be inferred that for constant hidden layer of value 8 and epoch value of 1000,the fault coverage is best for trainoss algorithm with. learning rate of 0.01 The fault coverage of all the three algorithms has been compared in the graphs above. It can be inferred that best fault coverage of 77% for trainoss when compared to other algorithms.

NCVCCC-‘08

6969

Fig 14: Performance Comparison for three algorithms with learning rate=0.01, epoch =1000 hidden layer =8

giving best fault classification

VII. CONCLUSION

In this paper, fault classification using the neural network with the three Back Propagation algorithm (trainbfg , traincgp, and trainoss) is used to classify the fault by varying the parameters like learning rate, epochs and the hidden layers and the output results are obtained. The trainoss algorithm is efficient enough to classify the faults up to 77%. The output results show the best values of the epoch, learning rate and number of hidden neurons for which the algorithms show best performance. This work can be extended with classifying the faults in other DAC converters by using the efficient neural network algorithms.

REFERENCES

1. Chi Hung Lin, Klass Bult., “A 10-b 500-MSamples CMOS DAC in 0.6m2 “, IEEE J.Solid State Circuit, pp. 1948-1958, Dec 1998.

2. Swapna Banerjee et al., “A 10-bit 80-MSPS 2.5-V 27.65-mW 0.185mm2 Segmented Current Steering CMOS DAC”,18th

International Conference on VLSI Design, pp. 319-322, Jan.2005,.

3. Jan M. Rabaey, ”Digital integrated circuits: a design perspective”, Prentice Hall of India Pvt Ltd, new delhi, 2002.

4. Morris Mano M.”Digital Design-third edition,” Prentice Hall of India Pvt Ltd, new delhi, 2000.

5. S N Sivanandam, S Sumathi, S N Deepa,”Introduction to Neural Network using MATLAB 6.0”,2006.

6. Grzechca. D, Rutkowski. J, “New Concept to Analog Fault Diagnosis by Creating Two Fuzzy-Neural Dictionaries Test”, IEEE MELCON, May 2004.

NCVCCC-‘08

70

Abstract- Path delay testing of FPGAs is especially important since path delay faults can render an otherwise fault-free FPGA unusable for a given design layout. In this approach, select a set of paths in FPGA based circuits that are tested in same test configuration. Each path is tested for all combinations of signal inversions along the path length. Each configuration consists of a sequence generator, response analyzer and circuitry for controlling inversions along tested paths, all of which are formed from FPGA resources not currently under test. The goal is to determined by testing whether the delay along any of the path in the test exists the clock period.Two algorithms are presented for target path partitioning to determine the number of required test configurations. Test circuitry associated with these methods is also described. Index terms- Design automation, Field Programmable Gate Arrays ,Programmable logic devices,testing.

I.INTRODUCTION

This paper is concerned with testing paths in lookup-table (LUT) based FPGAs after they have been routed. While this may be regarded as user testing , we are considering an environment in which a large number of manufactured FPGA devices implementing a specific design are to be tested to ensure correct operation at the specified clock speed. It is thus akin to manufacturing tests in that the time needed for testing is important. Ideally, we would like to verify that the actual delay of every path between flip-flops is less than the design clock period. Since the number of paths in most practical circuits is very large, testing must be limited to a smaller set of paths. Testing a set of paths whose computed delay is within a small percentage of the clock period may be sufficient in most cases. Thus, our goal is to determine by testing whether the delay along any of the paths in the set exceeds the clock period.

II.BASIC APPROACH

This path delay testing method is applicable to FPGA’s in which the basic logic elements are implemented by LUTs. The goal of this work is to test a set of paths, called target paths, to determine whether the maximum delay along any of them exceeds the clock period of the circuit. These paths are selected based on static timing analysis using nominal delay values and actual routing information. Circuitry for applying test patterns and observing results is configured using parts of the FPGA that are not under test.

INTRODUCTION TO APPROACH The delay of a path segment usually depends on the direction of signal transition in it. The direction of the signal transition in any segment is determined by that of the transition at the source and the inversions along the partial path leading to the particular segment. A test to determine whether the maximum delay along a path is greater than the clock period must propagate a transition along the path and produce a combination of side-input values that maximizes the path delay. This approach is not usually feasible because of the difficulty of determining the inversions that maximize the path delay and the necessary primary input values to produce them. Instead, we propose to test each target path for all combinations of inversions along it, guaranteeing that the worst case will also be included. Although the number of combinations is exponential in the number of LUTs along the path, the method is feasible because application of each test requires only a few cycles of the rated clock. However, the results may be pessimistic in that a path that fails a test may operate correctly in the actual circuit, because the combination of inversions in the failing test may not occur during normal operation. The method of testing a single path in a circuit is reprograms the FPGA to isolate each target path from the rest of the circuit and make inversions along the path controllable by an on-chip test controller. Every LUT along the path is re-programmed based on its original function. If it is positive unate in the on-path input, the LUT output is made equal to the on-path input independent of its side inputs. Similarly, negative unate functions are replaced by inverters. If the original function is binate in the on-path input, the LUT is re-programmed to implement the exclusive-OR (XOR) of the on-path input and one of its side-inputs, which we shall call its controlling sideinput.

Testing Path Delays In LUT-Based FPGAs Ms.R..Usha*, Mrs.M.Selvi, M.E(Ph.D). **

*Student,II yr M.E.,VLSI Design, Department of Electronics&Communication Engineering

**Asst.Prof, Department of Electronics&Communication Engineering Francis Xavier Engnieering College,Tirunelveli

Email: [email protected]

NCVCCC-‘08

71

As mentioned earlier,this change of functionality does not affect the delay of the path under test because the delay through an LUT is unaffected by the function implemented. Inversions along the path are controlled by the signal values on the controlling side inputs.For each combination of values on the controlling side inputs we apply a signal transition at the source of the path and observe the signal value at the destination after one clock period.The absence of a signal transition will indicate that the delay along the tested pathexceeds the clock period for the particular combination of inversions. The basic method described above can be implemented by the circuitry shown in Fig. 1, consisting of a sequence generator, a response analyzer and a counter, that generates all combinations of values in some arbitrary order. A linear feedback shift register modified to feedback shift register modified to include the all-0’s output may be used as the counter.The controlling side inputs are connected to the counter.The controller and the circuitry for applying tests and observing results are also formed during configuration in parts of the FPGA that do not affect the behavior of the path(s) under test. be used as the counter.The controller and the circuitry for applying tests and observing results are also formed during configuration in parts of the FPGA that do not affect the behavior of the path(s) under test. The sequence generator produces a sequence of alternating zeros and ones, with period equal to 6T,where T is the operational clock period. The response analyzer checks for an output transition for every test, and sets an error flip-flop if no transition is observed at the end of a test.The flip-flop is reset only at the beginning of the test session ,and will indicate an error if and only if no transition is produced in some test. The counter has as many bits as the number of binate LUTs along the tested path. The test for a path for each direction of signal direction consists of two parts ,an initialization part and a propagation part ,each of duration 3T.A path is tested in time 6T by overlapping the initialization part of each test with the propagation part of the preceding test.In addition the change of counter state for testing a path for a new combination of inversions is also done during the initialization phase of rising transition tests. Fig.2 shows the timing of the signals during the application of a test sequence. It can be seen from the figure that the source s of the test path toggles every three clock cycles.For correct operation, the input transition occurring at 3T must reach the destination within time T(i.e., before 3T+T).On the following clock edge at 3T+T,the result of the transition is clocked into the destination flip-flop at d.A change must be observed at the destination for every test, otherwise a flip-flop is set to indicate an error. In Fig.2,a test for the rising edge starts at time 3T,with

the s steady at zero for the preceding three clock cycles. A test for the falling transition starts at 6T,with the input steady at one for the preceding three clock cycles. Results are sampled at d at time 4T(for rising edge s transition)and 7T (for falling edge s transition),respectively.Thus,both rising and falling transitions are applied at the source for each combination of inversions in time 6T. As the falling transition is applied at 6T,the enable input E of the counter is set to 1.This action starts a state (counter)change at 7T to test the path for the next combination of inversions .A counter change at this time point allows2T of settling time before the following transition occurs at the source s.By ensuring that the counter reaches its final value within Tand propagates to the path destination d within an additional T,d is ensured to be stable before the following source transition. Thus, the destination will reach the correct stable value corresponding to the new combination of inversions if no path from the counter to the destination has a delay greater than 2T.This delay explains the need for a 3T period betweens transitions (1T to perform the test,1T for possible counter state changes ,and 1T for subsequent propagation of the counter changes to d).

III. TEST STRATEGY The method described in the preceding section requires the test control circuitry to be reconfigured for every path to be tested. The total time for testing a set of target paths in a circuit consists of the test application time and the reconfiguration time. Our goal is to reduce both components of the total time for testing a specified set of paths. Since the time needed for configuring the test structure is usually larger than that for applying test patterns generated on chip we shall focus on reducing the number of test configurations needed by testing as many paths as possible in each configuration. . Two approaches to maximize the number of paths tested in a test configuration suggest themselves. First, we can try to select a set of target paths that can be tested simultaneously. This will also have the effect of reducing test application time. Secondly, we can try to select a set of simultaneously testable sets that can be tested in sequence with the same configuration. In this case, the number of simultaneously tested paths may have to be reduced so as to maximize the total number of paths tested with the configuration. These two approaches will be elaborated in the next two sections,but first we define a few terms.. The simultaneous application of a single rising or falling transition at the sources of one or more paths and observing the response at their destinations is called a test. The set of tests for both rising and falling transitions for all combinations of inversions along each path is called a test phase, or simply, a

NCVCCC-‘08

72

phase. As mentioned earlier, a single path with k binate LUTs will have 2 ・ 2k tests in a test phase.

The application of all test phases for all target paths in a configuration is called a test session.

A. Single Phase Method

This method attempts to maximize the number of simultaneously tested paths. A set of paths may be tested in parallel if it satisfies the following conditions: 1) No two paths in the set have a common destination. 2) No fanout from a path reaches another path in the set. The above conditions guarantee that signals propagating along paths in the set do not interfere with one another. Moreover, if the same input is applied to all paths in the set, two or more paths with a common initial segment will not interact if they do not re-converge after fanout. All LUTs on paths to be tested in a session are reprogrammed to implement inverters, direct connections or XORs as discussed in the preceding section. The LUTs with control inputs are levelized, and all control inputs at the same level are connected to the same counter output. The source flipflops of all paths to be tested in the session are connected to the same sequence generator, but a separate transition detector is used for each path. The transition detectors of all paths are then ORed together to produce an error indication if any of the paths is faulty. Alternatively, a separate error flip-flop can be used for each tested path, connected to form a scan chain and scanned out to identify the faulty path(s).

B. Multi-phase Method

The single phase method described above requires that all paths tested in a session be disjoint. The number of test sessions needed for a large target set is therefore likely to be very large. The multi-phase method attempts to reduce the number of test sessions needed by relaxing the requirement that all paths tested in a session be disjoint. This, however, increases the test and cannot be tested simultaneously. Consider sets of target paths S1, S2, Sp such that all paths in each set are disjoint except for common sources. Clearly, all paths in each set Si can be tested simultaneously, as in the single phase method, if each set can be selected and logically isolated from all other paths. This allows the testing of the sets Si in sequence, and is the basis of our multi-phase method. We also restrict the target paths for each session to simplify the control circuitry needed. We assume that the LUTs in the FPGA are 4-input LUTs, but the method can be easily modified to allow a larger number of inputs. Since each LUT may need up to two control inputs, one for path selection and the other for inversion control, at most two target paths may pass through any LUT. Target paths satisfying the following conditions can be tested in a single session. 1) There is a path to each target path destination, called the main path to the destination. 2) Main paths may not intersect, but they may have a common initial section. 3) Additional paths to each destination, called its side paths, must meet only the main path and continue to the destination along the main path. 4) Main and side paths may not intersect any other path except that two or more paths may have a common source. 5)No more than two target paths may pass through any LUT. 6)The number of target paths to all destinations must be the same. The above conditions allow us to select one path to each output and test all of them in parallel. The first two conditions guarantee that the signal propagating along main paths to different destinations will not interact. The main paths can therefore be tested in parallel. The restriction that a side path can meet only the main path to the same destination [condition 3)] allows a simple mechanism for propagating a signal through the main path or one of its side paths. Together with Condition 4, it guarantees that a set of main paths or a set of side paths, one to each destination, can be tested in parallel. Condition 5 allows for two control signals to each LUT, one for controlling inversion, and the other for selecting the path for signal propagation. A single binary signal is sufficient for selecting one of the target paths that may pass through an LUT. The last condition is required to produce a signal change at every

NCVCCC-‘08

73

destination for every test, simplifying the error detection logic. With the above restrictions, LUTs on target paths will have one or two target paths through them. These LUTs are called 1-path LUTs and 2-path LUTs, respectively. The inputs that are not on target paths will be called free inputs. The following procedure selects a set of target paths satisfying the conditions for multi-phase testing by selecting appropriate target paths for each set Si from the set of all target paths in the circuit. The union of these sets is the set of paths targeted in a test session. The procedure is then repeated for the remaining paths to obtain the target paths for subsequent test sessions until all paths are covered.

PROCEDURE 1 1) Select a path that does not intersect any already selected path, as the main path to each destination. 2) For each main path, select a side path such that a) It meets the main path and shares the rest of the path with it. b) No other path meets the main path at the same LUT c) It does not intersect any already selected target path (except for segments overlapping the main path). 3) Repeat Step 2 until no new side path can be found for any main path. 4) Find the number, n, of paths such that a) There are n target paths to each destination. b) The total number of paths is a maximum.

5) Select the main path and n − 1 side paths to each

destination as the target paths for the session. : Figure 3 shows all the target paths in a circuit. The source and destination flip-flops are omitted for the sake of clarity. We start Procedure 1 by (arbitrarily) selecting dAEJLy and hCGKMz as the main paths to the destinations y and z.Adding paths eAEJLy, cEJLy and fBFJLy to the first path, and jCGKMz, nDGKMz and qHKMz to the second, we get the set of target paths shown in heavy lines. Since there are four paths to each destination, the eight target paths shown can be tested in a single four-phase session. The procedure can be repeated with the remaining paths to select sets of target paths for subsequent sessions. One possible set of test sessions is given in the following table, where the path(s) in the first row of each sessions were those chosen as the main path(s). Destination: y Destination: z Session 1 dAEJLy hCGKMz eAEJLy jCGKMz cEJLy nDGKMz fBFJLy qHKMz Session 2 gBEJLy gHKMz gFJLy kDGKMz Session 3 gBFJLy mDGKMz Session 4 hCFJLy jCFJL kDGLy Session 5 nDGLy Session 6 mDGLy

NCVCCC-‘08

74

The set of sessions may not be unique and depends on the choices made. Also note that not all sessions obtained are multiphase sessions. Session 3, for example, became a single-phase session because no path qualified as a side path of mDGKMz, which was arbitrarily chosen as the main path. No paths could be concurrently tested with those in Sessions 4, 5, and 6 because all paths to z had already been targeted. The sets of target paths obtained by Procedure 1 are such that each 2-path LUT has a main path and a side path through it. Thus, a single binary signal is sufficient to select the input through which the signal is to be propagated. Since the side path continues along the main path, selecting the appropriate input at the 2-path LUT where it meets the main path is sufficient for selecting the side path for testing. By using the same path selection signal, one side path to each destination can be selected simultaneously and tested in parallel. The FPGA configuration for a test session is obtained by the following procedure:

PROCEDURE 2 1) Configure a sequence generator and connect its output to the sources of all target paths of the session. 2) Configure a counter to control inversion parity, with the number of bits equal to the largest number of binate LUTs along any target path for the test session. 3) Configure a path selector to select the set of paths tested in each test phase, with the number of bits equal to the number of side paths to a destination. 4) Designate a free input of each LUT as its inversion controlinput p, and connect it to the counter output corresponding to its level. 5) Designate another free input of each 2-path LUT as its selector input s, and connect it to the path selector. 6) Modify the LUT of each 1-path LUT with on-path

input a to implement f = a ⊕ p, if the original

function is binate in a; otherwise f = a if it is positive or a if it is negative in a. 7) Modify the LUT of each 2-path LUT to implement

f = where a and b are on the main path and a side path, respectively. The above modification for 2-path LUTs assumes that they are binate in both on-path inputs. If the output of a 2-path LUT is unate in a or b or both, a slightly different function f is needed. For example, if the LUT output is binate in a and negative in b, the

modified LUT must implement Figure 4 shows the test structure for the circuit of Fig. 3.Only target paths that were selected for the first test session are shown, and all LUT functions are assumed to be binate in their inputs. The test circuitry consists of a sequence generator that produces a sequence of

alternating 1’s and 0’s, a four-bit counter for inversion control and a path selector. The path selector is a shift register that produces an output sequence, 000, 100, 010, 001 for the 4-phase test of the first session in our example. It can be verified from the figure that the main paths are selected when all selector outputs are 0. When any output is 1, exactly one side path to each destination is selected. Input transitions are applied to all paths simultaneously, but propagate only up to the first 2-path LUT on all paths except the selected ones.Thus,only one path to each destination will have transitions along its entire length.since these paths are disjoint,no interaction can occur among them.

IV. CONCLUSION In this paper, a new approach to testing selected sets of paths in FPGA-based circuits is presented. Our approach tests these paths for all combinations of inversions along them to guarantee that the maximum delays along the tested paths will not exceed the clock period during normal operation. While the test method requires reconfiguring the FPGA for testing, the tested paths use the same connection wires, multiplexers and internal logic connections as the original circuit, ensuring the validity of the tests. Following testing, the test circuitry is removed from the device and the original user circuit is programmed into the FPGA. Two methods have been presented for reducing the number of test configurations needed for a given set of paths. In one method, called the single-phase method, paths are selected so that all paths in each configuration can be tested in parallel. The second method, called the multi-phase method, attempts to test the paths in a configuration with a sequence of test phases, each of which tests a set of paths in parallel. Our experimental results with benchmark circuits show that these methods are viable, but the preferable method depends on the circuit structure. The use of other criteria, such as the total time for configuration and test application for each configuration, or better heuristics may lead to more efficient testing with the proposed approach.

NCVCCC-‘08

75

REFERENCES

[1] M. Abramovici, C. Stroud, C. Hamilton, S.

Wijesuriya, and V. Verma, “Using roving STARs for on-line testing and diagnosis of FPGAs in faulttolerant applications,” in IEEE Int. Test Conf., Atlantic City, NJ, Sept. 1999, pp. 28–30.

[2] M. Abramovici, C. Stroud, and J. Emmert, “Online BIST and BIST-based diagnosis of FPGA logic blocks,” IEEE Trans. on VLSI Systems, vol. 12, no. 12, pp. 1284–1294, Dec. 2004.

[3] I. G. Harris and R. Tessier, “Interconnect testing in cluster-base FPGA architectures,” in ACM/IEEE Design Automation Conf., Los Angeles, CA, June 2000, pp. 49–54.

[4] I. G. Harris and R. Tessier, “Testing and diagnosis of interconnect faults in cluster-based FPGA architectures,” IEEE Trans. on CAD, vol. 21, no. 11, pp. 1337–1343, Nov. 2002.

[5] W.K. Huang, F.J. Meyer, X-T. Chen, and F. Lombardi, “Testing configurable LUT-based FPGAs,” IEEE Trans. on VLSI Systems, vol. 6, no. 2, pp. 276–283, June 1998.

[6] C. Stroud, S. Konala, P. Chen, and M. Abramovici, “Built-in self-test of logic blocks in FPGAs (Finally, a free lunch),” in IEEE VLSI Test Symp., Princeton, NJ, Apr. 1996, pp. 387–392.

[7] C. Stroud, S.Wijesuriya, C. Hamilton, and M. Abramovici, “Built-in selftest of FPGA interconnect,” in IEEE Int. Test Conf., Washington, D.C., Oct. 1998, pp. 404–411.

[8] L. Zhao, D.M.H. Walker, and F. Lombardi, “IDDQ testing of bridging faults in logic resources of reprogrammable field programmable gate arrays,” IEEE Trans. on Computers, vol. 47, no. 10, pp. 1136–1152, Oct.1998.

[9] M. Renovell, J. Figuras, and Y. Zorian, “Test of RAM-based FPGA: Methodology and application to the interconnect,” in IEEE VLSI Test Symp., Monterey, California, Apr. 1997, pp. 230–237.

[10] C-A. Chen and S.K. Gupta, “Design of efficient BIST test pattern generators for delay testing,” IEEE Trans. on CAD, vol. 15, no. 12, pp. 1568– 1575, Dec. 1996.

[11] S. Pilarski and A. Pierzynska, “BIST and delay fault detection,” in

IEEE Int. Test Conf., Baltimore, MD, Oct. 1993, pp. 236–242.

[12] A. Krasniewski, “Application-dependent testing of FPGA delay faults ,” in Euromicro Conf., Milan, Italy, Sept. 1999, pp. 26267.

NCVCCC-‘08

76

VLSI Realisation Of SIMPPL Controller Soc For Design Reuse

Tressa Mary Baby John, II M.E VLSI Karunya University, Coimbatore, S.Sherine, Lecturer, ECE Dept., Karunya University, Coimbatore

email id:[email protected] contact no: 09994790024

Abstract-SoCs are defined as a collection of functional units on one chip that interact to perform a desired operation. The decreasing size of process technologies enables designers to implement increasingly complex SoCs using Field Programmable Gate Arrays (FPGAs). This will reduce impact of increased design time and costs for electronics when we try to increase design complexity. The project describes as to how SoCs are designed as Systems Integrating Modules with Predefined Physical Links (SIMPPL) controller. The design consists of computing systems as a network of Computing Elements (CEs) interconnected with asynchronous queues. The strength of the SIMPPL model is the CE abstraction, which allows designers to decouple the functionality of a module from system-level communication and control via a programmable controller. This design aims at reducing design time by facilitating design reuse, system integration, and system verification. Abstract-SoCs are defined as a collection of functional units on one chip that interact to perform a desired operation. The modules are typically of a coarse granularity to promote reuse of previously designed Intellectual Property (IP). The decreasing size of process technologies enables designers to implement increasingly complex SoCs using Field Programmable Gate Arrays (FPGAs). This will reduce impact of increased design time and costs for electronics when we try to increase design complexity. The project describes as to how SoCs are designed as Systems Integrating Modules with Predefined Physical Links (SIMPPL) controller. The design consists of computing systems as a network of Computing Elements (CEs) interconnected with asynchronous queues. The strength of the SIMPPL model is the CE abstraction, which allows designers to decouple the functionality of a module from system-level communication and control via a programmable controller. This design aims at reducing design time by facilitating design reuse, system integration, and system verification.The SIMPPL controller acts as the physical interface of the IP core to the rest of the system. Its instruction set is designed to facilitate controlling the core’s operations and reprogramming the core’s use for different applications.

SIMPPL controller consists of Execute Controller Debug Controllers Control Sequencer

The implementation of all the functional blocks of SIMPPL controller has to be done and test bench will be created to prove the functionality with the data from off chip interfaces. Index Terms- Design reuse, Intellectual property, Computing Element,

I. INTRODUCTION

A. What Is SIMPPL?

The SIMPPL controller acts as the physical interface of the IP core to the rest of the system. It processes instruction packets received from other CE’s and its instruction set is designed to facilitate controlling the core’s operations and reprogramming the core’s use for different applications. SIMPPL uses IP (Intellectual Property) concepts with predefined modules making using of design reuse and it expedites system integration using IP concepts. Reusing IP is more challenging in hardware designs than reusing software functions. Software designers benefit from a fixed implementation platform with a highly abstracted programming interface, enabling them to focus on adapting the functionality to the new application. Hardware designers not only need to consider changes to the module’s functionality, but also to the physical interface and communication protocols. The SIMPPL is modelled as a system model with abstraction for IP modules called computing element (CE) that facilitate SoC design for FPGAs. The processing element represents the datapath of the CE or the IP module, where an IP module implements a functional block having data ports and control and status signals. B. Why SIMPPL? For communication, tasks performed are common. In the normal communication interface, there is merely only transfer of data. There is no processing of data and if there is any error, it is checked only after it is received at the receiver end, hence a lot of time is wasted debugging the error. But SIMPPL has two controllers, namely, normal and debug controller. Thus testing is done as and when error is detected. Abstracting IP modules as computing elements (CEs) can reduce the complexities of adapting IP to new applications. The CE model separates the datapath of the IP from system-level control and communications.

NCVCCC-‘08

77

A lightweight controller provides the system-level interface for the IP module and executes a program that dictates how the IP is used in the system. Localizing the control for the IP to this program simplifies any necessary redesign of the IP for other applications. Most of the applications are ready to use, hence with slight modification, we can make it compatible for any other applications or complicated architecture. C. Advantages of SIMPPL In FPGAs

The advantage of using SIMPPL in FPGA is that the SIMPPL design is based on simple data flow and also the design is split into CE and PE. CE can be implemented with the help of combinational circuits and PE can be easily arrived at with the help of simple FSM. Both of these favour high speed applications because there are no complicated arithmetic or transformations such as sine, cosine transforms etc. The most important benefits to designing SoCs on an FPGA are that there is no need to finalize the partitioning of the design at the beginning of the design process or to create a complex co-simulation environment to model communication between hardware and software.

D. The SIMPPL System Model The Fig 1.below shows a SIMPPL system model, the SIMPPL SoC architecture of a network of CEs comprising the hardware and software modules in the system. I/O Communication Links are used to communicate with off-chip peripherals using the appropriate protocols. The Internal Communication Links are represented by arrows in between the CE’s. These are defined as point-to-point links to provide inter-CE communications where the communication protocols are abstracted from the physical links and implemented by the CEs. SIMPPL is thus a point-to-point interconnection architecture for rapid system development. Communication between processing elements is achieved through SIMPPL. Several modules are connected on a point-to-point basis to form a generic computing system. Mechanism for physical transfer of data across a link is provided so that designer can focus on the meaning of data transfer. SIMPPL greatly facilitates speeds and ease of hardware development. For our current investigation, we are using n-bit wide asynchronous first-in–first-out (FIFOs) to implement the internal links in the SIMPPL model. Asynchronous FIFOs are used to connect the different CE’s to create the system. SIMPPL thus represents computing systems as hardware of CE’s interconnected with asynchronous FIFOs. Asynchronous FIFOs isolate clocking domains to individual CEs, allowing them to transmit and receive at data rates independent of the other CEs in the system. This simplifies system-level design by

decoupling the processing rate of a CE from the inter-CE communication rate. For the purposes of this discussion, we assume a FIFO width of 33 bits, but leave the depth variable.

OFF-CHIP

Fig. 1. Generic computing system described using the

SIMPPL model. II. SIMPPL CE ABSTRACTION Design Reuse With IP Modules And Adaptability IP reuse is one of the keys for SoC design productivity improvement. IP core is a block of logic gates used in making FPGA or ASIC for a product. IP cores are blocks of information and they are portable as well. They are essential elements of design reuse. Design reuse is faster and cheaper to build a new product because they are not designed earlier but also tested for reliability. In hardware, a component in design reuse is called IP cores. Software designers have a fixed implementation with a strong abstracted programming interface that helps to adapt functionality to new application while hardware users need to consider physical interface and communication protocols adaptability along with modules functionality. This is the cost of design and verification across multiple designs and is proven to increase productivity. The VSI Alliance has proposed the Open Core Protocol (OCP) to enable the provided by SIMPPL. IP reuse enables the team to leverage separation of external core communications from the IP core’s functionality, similar to the SIMPPL model. Both communication models are illustrated in Figure 2. The SIMPPL model targets the direct communication model using a defined, point-to-point interconnect structure for all on-chip communications. In contrast, OCP is used to provide a well-defined socket interface for IP that allows a designer to attach interface modules that act as adaptors to different bus standards that include point-to-point interconnect structures as shown in Figure 2. This allows a designer to easily connect a core to all bus types supported by the standard. The SIMPPL model, however, has a fixed interface; supporting only point-to-point connections with the objective of allowing is to enable designers to treat IP

ON-CHIP

CC

C

C

C

C

NCVCCC-‘08

78

modules as programmable coarse grained functional units. Designers can then reprogram the IP module’s usage in the system to adapt to the requirements of new applications.

Fig. 2. Standardizing the IP interface using (a) OCP for different bus standards and (b) SIMPPL for point-to-

point communications. B. CE Abstraction

The strength of the SIMPPL model is the CE abstraction, which allows designers to decouple the functionality of a module from system-level communication and control via a programmable controller. This design aim at reduces design time by facilitating design reuse, system integration, and system verification. The CE is an abstraction of software or hardware IP that facilitates design reuse by separating the datapath (computation), the inter-CE communication, and the control. Researchers have demonstrated some of the advantages of isolating independent control units for a shared datapath to support sequential procedural units in hardware. This is similar to when a CE is implemented as software on a processor (software CE), the software is designed with the communication protocols, the control sequence, and the computation as independent functions. Ideally, a controller customized to the datapath of each CE could be used as a generic system interface, optimized for that specific CE’s datapath. To this end, we have created two versions of a fast, programmable, lightweight controller—an execution-only (execute) version and a run-time debugging (debug) version—that are both adaptable to different types of computations suitable to SoC designs, one of them is field-programmable gate array (FPGAs). Fig.3 illustrates how the control, communications and the datapath are decoupled in hardware CEs. The processing element (PE) represents the datapath of the CE or the IP module, where an IP module implements a functional block having data ports and control and status signals. It performs a specific function, be it a computation or communication with an off-chip

peripheral, and interacts with the rest of the system via the SIMPPL controller, which interfaces with the internal communication links to receive and transmit instruction packets. The SIMPPL Control Sequencer (SCS) module allows the designer to specify, or ‘‘program’’, how the PE is used in the SoC. It contains the sequence of instructions that are executed by the controller for a given application. The controller then manipulates the control bits of the PE based on the current instruction being executed by the controller and the status bits provided by the PE.

Fig. 3. Hardware CE abstraction.

III. SIMPPL CONTROLLER

The SIMPPL controller acts as the physical interface of the IP core to the rest of the system. Its instruction set is designed to facilitate controlling the core’s operations and reprogramming the core’s use for different applications. As told above, we have to design two versions of controllers- a execution-only version and a run-time debugging version, in other words, a execute controller and a debug controller. The Execute controller has 3 parts, namely, consumer execute, producer execute and full execute. The Debug controller also has 3 parts, a consumer debug, producer debug and full debug.

A. Instruction Packet Format

SIMPPL uses instruction packets to pass both control and data information over the internal communication links shown in Fig. 1. Fig. 4 provides a description of the generic instruction packet structure transmitted over an internal link. Although the current SIMPPL controller uses a 33-bit wide FIFO, the data word is only 32 bit. The remaining bit is used to indicate whether the transmitted word is an instruction or data. The instruction word is divided into the least significant byte, which is designated for the opcode, and the upper 3 bytes, which represents the number of data words (NDWs) sent or received in an instruction packet. The current instruction set uses only the five least significant bits (LSBs) of the opcode byte to represent the instruction. The remaining bits are

H/W IP to OCP

OCP to Bus A

Bus A

H/W IP to OCP

OCP

OCP to Bus B

Bus B

H/W IP H/W IP

NCVCCC-‘08

79

reserved for future extensions of the controller instruction set.

Fig. 4. An internal link’s data packet format. Each instruction packet begins with an instruction word that the controller interprets to determine how the packet is used by the CE. Since the SIMPPL model uses point-to-point communications, each CE can transfer/receive instruction packets directly to/from the necessary system CEs to perform the appropriate application- specific computations. B. Controller Architecture Figure 5 illustrates the SIMPPL controller’s datapath architecture. The controller executes instructions received via both the internal receive (Rx) link and the SCS. Instructions from the Rx Link are sent by other CEs as a way to communicate control or status information from one CE to another CE, whereas instructions from the SCS implement local control. Instruction execution priority is determined by the value of the Cont Prog bit so that designers can vary priority of program instructions depending on how a CE is used in an application. If this status bit is high, then the “program” (SCS) instructions have the highest priority, otherwise the Rx link instructions have the highest priority. Since the user must be able to properly order the arrival of instructions to the controller from two sources, allowing multiple instructions in the execution pipeline greatly complicates the synchronization required to ensure that the correct execution order is achieved. Since the user must be able to properly order the arrival of instructions to the controller from two sources, allowing multiple instructions in the execution pipeline greatly complicates the synchronization required to ensure that the correct execution order is achieved. Therefore, the SIMPPL controller is designed as a single-issue architecture,

where only one instruction is in flight at a time, to reduce design complexity and to simplify program writing for the user. The SIMPPL controller also monitors the PE-specific status bits that are used to generate status bits for the SCS, which are used to determine the control flow of a program. The format of an output data packet sent via the internal transmit (Tx) link is dictated by the instruction currently being executed. The inputs multiplexed to the Tx link are the Executing Instruction Register (EX IR), an immediate address that is required in some instructions, the address stored in the address register a0 and any data that the hardware IP transmits. Data can only be received and transmitted via the internal links and cannot originate from the SCS. Furthermore, the controller can only send and receive discrete packets of data, which may not be sufficient for certain types of PEs requiring continuous data streaming. To solve this problem, the controller supports the use of optional asynchronous FIFOs to buffer the data transmissions between the controller and the PE.

Fig. 5. An overview of the SIMPPL controller datapath

architecture.

C. Controller Instruction Set

Table 1 contains all the instructions currently supported by the SIMPPL controller. The objective is to provide a minimal instruction set to reduce the size of the controller, while still providing sufficient programmability such that the cores can be easily reconfigured for any potential application.

NCVCCC-‘08

80

TABLE 1-Current Instruction Set Supported by SIMPPL Controller

Although some instructions required to fully

support the reconfigurability of some types of hardware PEs may be missing, the instructions in Table 1 support the hardware CEs that have been built to date. Furthermore, the controller supports the expansion of the instruction set to meet future requirements. The first column in Table 1 describes the operation being performed by the instruction. Columns 2 through 4 are used to indicate whether the different instruction types can be used to request data (Rd Req), receive data (Rx), or write data (Wr). The next two columns are used to denote whether each instruction may be issued from or executed from the SCS (S) or internal Receive Communication Link (R). Finally, the last two columns are used to denote whether the instruction requires an address field (Addr Field) or a data field (Data Field) in the packet transmission. The first instruction type described in Table 1 is the immediate data transfer instruction. It consists of one instruction word of the format shown in Figure 4, excluding the address field, where the two LSBs of the opcode indicates whether the data transfer is a read request, a write, or a receive. The immediate data plus immediate address instruction is similar to the immediate data transfer instruction except that an address field is required as part of the instruction packet.Designers can reduce the size of the controller by tailoring the instruction set to the PE. Although some CE’s receive and transmit data, thus requiring the full instruction set, others may only produce data or consume data. The Producer controller (Producer) is designed for CEs that only generate data. It does not support any instructions that may read data from a CE. The Consumer controller (Consumer) is designed for CEs that receive input data without generating output data. It does not support any instructions that try to write PE data to a Tx link.

IV. SIMPPL CONTROL SEQUENCER The SIMPPL Control Sequencer provides the

local program that specifies how the PE is to be used by the system. The operation of a SIMPPL controller is analogous to a generic processor, where the controller’s instruction set is akin to assembly language. For a processor, programs consist of a series of instructions used to perform the designed operations. Execution order is dictated by the processor’s Program Counter (PC), which specifies

the address of the next instruction of the program to be fetched from memory. While a SIMPPL controller and program perform the equivalent operations to a program running on a generic processor, the controller uses a remote PC in the SCS to select the next instruction to be fetched. Figure 6 illustrates the SCS structure and its interface with the SIMPPL controller via six standardized signals. The 32-bit program word and the program control bit, which indicates if the program word is an instruction or address, are only valid when the valid instruction bit is high. The valid instruction signal is used by the SIMPPL controller in combination with the program instruction read to fetch an instruction from the Store Unit and update the PC. The continue program bit indicates whether the current program instruction has higher priority than the instructions received on the CE Rx link. It can be used in combination with PE-specific and controller status bits to help ensure the correct execution order of instructions.

Fig. 6. Standard SIMPPL control sequencer structure

and interface to the SIMPPL controller. A. Consumer Controller

We have 4 interfacing blocks for communication within the consumer execute controller. They are Master, Slave, Processing Element, and Programmable Interface. The Consumer writes data to the Master. Slave is from wherein the Consumer reads data. The signals of the Master block are Master clock, Master write, Master data, Master control Master full. Following are the signals of the Slave block, Slave clock, Slave data, Slave control, Slave read, Slave exist. There are 2 more signals that are generated in relation to the Processing Element and these are generated from the Processing Element to the Consumer. Following are the signals: can_write_data, can_write_addr. The signals generated from the Programmable Interface to the Consumer are as follows: program control bit program valid instruction, cont_program, and program instruction. The signals generated from the Consumer to the Programmable Interface are as follows: prog_instruction_read. The input signals of the blocks are given to the consumer controller and the output

NCVCCC-‘08

81

signals are directed to the blocks from the consumer controller. Initially when the process begins, the controller checks whether the instruction is a valid instruction or not. If not, the instruction is not executed, as the valid instruction bit is not set as high. On the receiving of a valid instruction, the valid instruction bit goes high, the instruction is identified then by the control bit. We may receive either data or instruction. When data is received from the slave, the consumer will read the data and store it in the Processing Element. When the slave read pin is becomes ‘1’, the slave data will be transferred. Once this data is received, the Processing Element checks for the condition whether its ready to set the can_write_data pin or can_write_address. This is known once the data is sent to the consumer and hence the can_write_data is set. After this the corresponding acknowledge signals are sent and once the data transfer is ensured, the can_write_address pin is set to ‘1’ from the Processing Element. Once this write_address is received, the data in the slave is transferred to the Processing Element. When the consumer communicates with the Master, all the data is transferred to the Master. Master block deals with pure data transfer, hence on receiving pure data instead of instruction, the Slave|_data is stored as Master_data. The address to store this Master_data is governed by the Consumer controller.

The two important facts we are dealing here is concerning the program instruction and the slave data. The slave data for this module is a fixed value. The program instruction is given any random value. It contains the instruction and the size of the data packets that is data words. These data words are in a continuous format and are generated as per the counter.

V RESULTS

VI FUTURE WORK

The CE abstraction facilitates verification of the PE’s functionality. Hence a debug controller will be introduced based on the execute SIMPPL controller that allows the detection of low-level programming and integration errors. For secure data transfer, encryption and

decryption of data will be done at the producer and consumer controller ends respectively as an enhancement part of this project.

RFERENCES [1] M. Keating and P. Bricaud, Reuse Methodology Manual for System-on-a-Chip Designs. Norwell, MA: Kluwer Academic, 1998. [2] H. Chang, L. Cooke, M. Hung, G. Martin, A. J. McNelly, and L. Todd, Surviving the SOC Revolution: A Guide to Platform-Based Design. Norwell, MA: Kluwer Academic, 1999. [3]L.Shannon and P.Chow, “Maximizing system performance:Using reconfigurability to monitor system communications,” in Proc. IEEE Int. Conf. on Field-Programm. Technol., Dec. 2004, pp. 231–238. [4] ——, “Simplifying the integration of processing elements in computing systems using a programmable controller,” in proc. IEEE Symp. on Field-Programm.Custom Comput. Mach., Apr. 2005, pp. 63–72. [5] E. Lee and T. Parks, “Dataflow process networks,” Proc. IEEE, vol. 83, no. 5, pp. 471– 475, May 1995. [6] K. Jasrotia and J. Zhu, “Stacked FSMD: A power efficient micro-architecture for high level synthesis,” in Proc. Int. Symp. on Quality Electronic Des., Mar. 2004, pp. 425–430.

NCVCCC-‘08

82

Clock Period Minimization of Edge Triggered Circuit 1.D.Jacukline Moni, 2S.Arumugam,1 Anitha.A,

1ECE Department, Karunya University, 2Chief Executive, Bannari Amman Educational Trust

Abstract--In a sequential VLSI circuit, due to differences in interconnect delays on the clock distribution network, clock signals do not arrive at all of the flip-flops (FF) at the same time. Thus there is a skew between the clock arrival times at different latches. Among the various objectives in the development of sequential circuits, clock period minimization is one of the most important one. Clock skew can be exploited as a manageable resource to improve circuit performance. However, due to the limitation of race conditions, the optimal clock skew scheduling often does not achieve the lower bound of sequential timing optimization. This paper presents the clock period minimization of edge-triggered circuits. The objective here is not only to optimize the clock period but also to minimize the required inserted delay for resolving the race conditions. This is done using Modelsim XE 11 5.8c.

I. INTRODUCTION

Most integrated circuits of sufficient complexity utilize a clock signal in order to synchronize different parts of the circuit and to account for the propagation delays. As ICs become more complex, the problem of supplying accurate and synchronized clocks to all the circuits become difficult. One example of such a complex chip is the microprocessor, the central component of modern computers. A clock signal might also be gated or combined by a controlling signal that enables or disables the clock signal for a certain part of a circuit. In synchronous circuit, clock signal is a signal used to coordinate the actions of two or more circuits. A clock signal oscillates between high and a low state and is usually in the form of a square wave. Circuits using the clock signal for synchronization may become active at either rising edge, falling edge or both edges of the clock cycle. A synchronous circuit is one in which all the parts are synchronized by a clock. In ideal synchronous circuits, every change in the logical levels of its storage components is simultaneous. These transactions follow the level change of a special signal called the clock. Ideally, the input to each storage element has reached its final value before the next clock occurs, so the behaviors of the whole circuit can be predicted exactly .Practically, some delay is required for each logical operation, resulting in a maximum speed at which each synchronous system can run. To make these circuits work correctly, a great deal of care is needed in the design of the clock distribution network. This paper deals with the clock period minimization of edge triggered circuit. Clock skew is a phenomenon in synchronous circuits in which the clock signal arrives at different components at different times. This can be due to wire-interconnect length, temperature violations, capacitive coupling, material imperfections etc. As design complexity and clock frequency continue to

increase, more techniques are developed for clock period minimization. An application of optimal clock skew scheduling to enhance the speed characteristics of functional blocks of an industrial chip was demonstrated in [1].

II.PROJECT DESCRIPTION

This paper deals with the clock period minimization of edge triggered circuits. Edge triggered circuit are the sequential circuits that use the edge-triggered clocking scheme. It consists of registers and combinational logic gates with wires connecting between them. Each logic gate has one output pin and one or more input pin. A timing arc is used to denote the signal propagation from input pin to output pin and suitable delay value for the timing arc is also taken in to account. In the design of an edge-triggered circuit, if the clock edge arrives at each register exactly simultaneously, the clock period cannot be shorter than the longest path delay. If this circuit has timing violations cause by long paths, an improvement can be done by an optimization step. There are two approaches to resolve the timing violations of long paths. One is to apply logic optimization techniques for reducing the delays of long paths; and the other is to apply sequential timing optimization techniques, such as clock skew scheduling [7] and retiming transformation [5] [8] , to adjust the timing slacks among the data paths. Logic optimization techniques are applied earlier. For those long paths whose delays are difficult to further reduce, sequential timing optimization techniques are necessary.

It is well known that the clock period of a nonzero clock skew circuit can be shorter than the longest path delay if the clock arrival times of registers are properly scheduled. The optimal clock skew scheduling problem can be formulated as a constraint graph and solved by polynomial time complexity algorithms like cycle detection method [6],binary search algorithms, shortest path algorithms [2] etc. Given a circuit graph G, the optimal clock skew scheduling problem is to determine the smallest feasible clock period and find an optimal clock skew schedule, which specifies the clock arrival times of registers for this circuit to work with the smallest feasible clock period. Due to the limitation of race conditions, the optimal clock skew scheduling often does not achieve the lower bound of sequential timing optimization. Thus, a combination of optimal clock skew scheduling and delay insertion may lead to further clock period reduction. For this circuit graph shown below is taken for analysis. This approach of combining optimal clock skew scheduling and delay insertion for the synthesis of nonzero clock skew circuits was done using Delay Insertion and Nonzero Skew (DIANA) algorithm.

The DIANA algorithm is an iteration process between the construction of an effective delay-inserted circuit graph and the construction of an irredundant delay-inserted circuit graph. The iteration process repeats until

NCVCCC-‘08

83

the clock period cannot be further reduced. The delay component is then applied to the edge triggered circuit that we have taken.

III.METHOD 1 A. LOWER BOUND OF SEQUENTIAL TIMING OPTIMIZATION

Fig 1 shows an edge triggered flipflop circuit. It consists of registers and combinational logic gates with wires connecting them. The circuit has four registers and eight logic gates. Each logic gate has one ore more input pin and one output pin. A timing arc is defined to denote the signal propagation from input to output. The delays of the timing arc in the edge triggered circuit are initialized as shown in the table below. A data path from register Ri to register Rj denoted as Ri Rj includes the combinational logic from Ri to Rj. The circuit can also be modeled as a circuit graph G (V, E) for timing analysis where V is the set of vertices and E is the set of directed edges. Each vertex represents a register and special vertex called host is used to synchronize the input and output. A directed edge (Ri, Rj) represents a data path Ri Rj, and it is associated with weight which represents the minimum and maximum propagation delay of the data path. The circuit graph of the edge triggered flipflop is shown in fig 2. From the graph it is clear that the maximum propagation delay path is TPD3,4 (max) and is 6 time units (tu) .

Fig. 1. Edge triggered flipflop

Table 1. Delays of timing arcs

The delay to register ratio of a directed cycle C is given by maximum delay of C / the number of registers in C. This gives the lower bound of sequential timing optimization. From the circuit graph it is clear that, the maximum delay to register ratio [9] of the directed cycle is 4 tu. The waveform of the edge triggered circuit is shown in fig 3.

Fig.2. Circuit Graph

Fig. 3. Waveform of edge triggered flipflop METHOD 2

B.OPTIMAL LOCK SKEW SCHEDULING This section introduces circuit graph and constraint graph to model the optimum clock skew scheduling problem. This problem can be modeled as Graph –theoretic problem [4]. Let TCi denote the clock arrival time of register Ri. TCi is defined as the time relative to a global time reference. Thus, TCi may be a negative value. For a data path Ri Rj , there are two types of clocking hazards: double clocking where the same clock pulse triggering the same data in two adjacent registers; and zero clocking, where the data reaches a register too late relative to the following clock pulse .To prevent double clocking, the clock skew must satisfy the following constraint: TCj - TCi TPDi,j(min). To prevent zero clocking, the clock skew must satisfy the following constraint: TCi - TCj P- TPDi,j(max), where P is the clock period. Both the two inequalities define a permissible clock skew range of a data path. Thus if we have a circuit graph, G and clock period P, we can model the constraints of clocking hazards by a constraint graph, G cg( G, P) where each vertex represents a register and a directed edge corresponds to either type of constraint. Each directed edge ( Ri, Rj ) in the circuit graph, G has a D-edge and Z-edge in the corresponding constraint graph Gcg(G,P).The D- edge corresponds to double clocking constraints and is in the direction of signal propagation and is associated with weight TPDi,j(min). The Z edge corresponds to zero clocking constraints, is against the direction of signal propagation and is associated with weight P - TPDi,j(max). Using the circuit graph shown in fig 2, the corresponding constraint graph is shown in fig 4(a) with the clock period as P. A

NCVCCC-‘08

84

circuit graph works with the clock period P only if the clock skew schedule satisfies the clocking constraints. The optimal clock skew scheduling problem is to determine the smallest feasible clock period of a circuit graph and find the corresponding clock skew schedule for the circuit graph to work with the smallest feasible clock period.

Optimum clock skew scheduling problem is solved by applying binary search approach. At each step in the binary search [3], for constant value of the clock period P, check for negative cycle is done .The binary approach is repeated until smallest feasible clock period is attained. After applying this approach, we get the smallest feasible clock period as 5tu. The corresponding constraint graph is shown in fig 4(b). When the clock period is 5 tu, there exist a critical cycle R3 R4 R3 in the constraint graph. If the clock period is less than 5 tu, this cycle will become a negative cycle. From the fig 3(b), optimum clock skew scheduling is limited by the critical cycle R3 R4 R3, which is not a critical z-cycle. This critical cycle has a critical D-edge ed (R3 R4). The weight of the D- edge is the minimum delay from Register R3 to Register R4. Thus, if we increase the minimum delay, the cycle becomes a non critical one. The optimal clock skew schedule is taken as Thost = 0 tu, TC1 = 2 tu, TC2 = 2 tu, TC3 = 2 tu and TC4 = 3 tu. The corresponding waveform representation is also shown in fig 5.

But due to limitation of race conditions, the optimal clock skew scheduling often does not achieve the lower bound of sequential timing optimization. Also different clock skew schedule will have different race conditions. So delay insertion [10] is taken into account in determining the clock skew schedule.

Fig 4. (a) Constraint Graph Gcg (ex1, P). (b) Constraint Graph Gcg(ex1, 5).

Fig. 5. Waveform of OCSS C. DELAY INSERTED CIRCUIT GRAPH

Delay Inserted Circuit Graph is to model the increase of the minimum delay of every data path during the stage of clock skew scheduling .This is a two step process, in which we can achieve the lower bound of sequential timing optimization. In the first step, clock

skew schedule is derived by taking zero clocking constraints into account. In the second step, delay insertion is applied to resolve the race conditions. Consider the circuit graph shown in fig 6 (a) .Here the lower bound of sequential timing optimization is 3 tu. There is no negative cycle in the constraint graph of fig 6 (b). The clock skew schedule is taken as Thost = 0 tu, TC1 = 0 tu, TC2 = 0 tu and TC3 = 1 tu .Here the lower bound is achieved with out any delay insertion.

Fig.6 (a) Circuit Graph ex2 (b) Constraint Graph Gcg(ex2, 3).

On the other hand, fig 6 shows the two step

process for obtaining a delay inserted circuit graph which works with a clock period P = 3tu. In the first step, since only zero clocking constraints are considered, clock skew schedule is taken as Thost = 0 tu, TC1 = 2 tu, TC2 = 2 tu and TC3 = 3 tu. This is shown in fig 7(a). Then in the second step, delay insertion is applied to resolve the race conditions. Here the required increase of the minimum delay from host to R2 is 1 tu and the required increase of the minimum delay from host to R3 is 2 tu. Fig 7(b) shows this process. The two step process result in extra delay insertion. The corresponding waveform is also shown in fig 8.

Fig. 8. Waveform of the two step process

Fig.7. Two step process (a) First Step

NCVCCC-‘08

85

(b) Second step

IV.DESIGN METHODOLOGY

The proposed approach is combining optimum clock skew scheduling and delay insertion using an algorithm known as delay insertion and nonzero skew algorithm (DIANA). This is a three step iteration process which involves delay insertion , optimum clock skew scheduling and delay minimization .The input to the algorithm is an edge triggered circuit and the output is also an edge triggered circuit that works with a clock period under a given clock skew schedule. This algorithm involves construction of an effective delay inserted circuit graph and the construction of irredundant delay inserted circuit graph. The iteration process is repeated until the clock period cannot be reduced further. The edge triggered circuit shown in fig 1 is used for this approach. The pseudocode of this algorithm is shown below. Procedure DIANA (Gin) begin k= 0; GMin(k) = Gin (SMin(k) , PMin(k) ) = OCSS (GMin(k)); repeat k=k+1; GINS(k) = Delay Insertion(GMin(k-1), SMin(k-1) PMin(k-1) ; (SINS(K), P(INS(K) ) = OCSS(GMin(k)); (GMin(k), SMin(k) PMin(k) ) =Del_Min(GINS(k) ,P(INS(K) ); until (PMin(k) = PMin(k -1) ); Gopt = GMin(k) ; Sopt = SMin(k) ; Popt = PMin(k) ; return (Gopt, Sopt , Popt)

end. Initially Gmin(0) = Gin. The procedure OCSS performs optimal clock skew scheduling. The procedure Delay_insertion is used to obtain delay inserted circuit graph Gins(k) by increasing the minimum delay of every critical minimum path with respect to given clock period Pmin(k-1) and clock skew schedule S min(K-1) . After delay insertion the D-edges in the constraint graph is noncritical . so we use OCSS again to reduce the clock period further. The procedure Del_Min is used to obtain irreduntant delay inserted circuit graph by minimizing the increased delay of each delay inserted data path in the circuit graph with respect to the clock period. This process is repeated until the clock period cannot be further reduced.

V. PROPOSED METHOD TO CIRCUIT GRAPH

Here the edge triggered circuit shown in fig 1 is taken. The circuit graph is shown in fig 2. From fig 4(b) the smallest clock period is 5tu and the clock skew schedule is taken as (0, 2, 2, 3). As first step, delay inserted circuit graph is constructed for the fig 2. Here two loop iterations are performed one with clock period 4.5 tu and other with a clock period of 4 tu .The data paths host R1 and R3 R4 are critical minimum paths. The feasible value for the increase of minimum delay from host to register R1 is with in the interval (0, 5) where as for R3 to R4 is (0, 6 -1). Thus we take phost,1 as 5/2 = 2.5 and P3,4 =5tu. The effective delay inserted circuit and corresponding constraint graph is shown in fig 8(a) and (b).

Fig 8 (a) Effective delay inserted circuit graph (b) Corresponding constraint graph Next step is the construction of irreduntant delay inserted circuit graph. Here there are two delay inserted data paths host R1 and R3 R4 . From 8(b) it s clear that the minimum value of phost,1 to work with a clock period 4.5 tu is 0 and for P3,4 is .5tu. The corresponding circuit and constraint graph under the given clock skew schedule is shown in fig 9(a) and (b).

Fig 9 (a) Irreduntant delay inserted circuit graph (b) Corresponding constraint graph In the second loop iteration also we construct the effective delay inserted circuit graph of fig 9(a) and as before first find the critical minimum paths and then the feasible value for the increase of minimum delay. After finding the smallest clock period, construct the irreduntant delay inserted graph. Once critical z cycle is there, the clock period cannot be further reduced. The process is repeated till this stage. The clock period thus we get through DIANA algorithm gives the lower bound of sequential timing optimization for the edge triggered circuit. The waveform representation of the above approach for the clock period 4.5tu and 4 tu is shown in fig 10(a) and (b).

NCVCCC-‘08

86

Fig 10(a) waveform for the clock period 4.5tu

Fig 10(b) waveform for the clock period 4tu

VI BENCHMARK RESULTS OF DIANA ALGORITHM

The Diana Algorithm is applied to a series of benchmark circuits. The results obtained are as follows. Circuit

Clock Period Gate Delay

B01 2.212 6.788 B03 6.23 5.32 B04 17.85 5.942

VII.CONCLUSION This paper describes the clock period minimization of edge triggered circuit. This paper uses a delay insertion and non zero skew algorithm to optimize the clock period .Experimental results of various sections of this project are shown above. It is clear that clock period is minimized using this algorithm than any other approaches. This algorithm is applied to series of benchmark circuits and the results are shown above.

REFERENCES

[1] Adler V, Baez F, Friedman E. G, Kourtev I. S, Tang K.T and Velenis, D “Demonstration of speed enhancements on an industrial circuit through application of non-zero clock skew scheduling,” in Proc. IEEE Int. Conf. Electronics, Circuits and Systems, St. Julians, Malta, (2001),vol. 2, pp. 1021–1025. [2] Albrecht.C, Korte.B , Schietke.J and Vygen.J, “Cycle time and slack optimization for VLSI chips,” in Proc. IEEE/ACM Int. Conf. Computer Aided Design, San Jose, CA, (1999), pp. 232–238. [3] Cormen.T.H , Leiserson C. E, and Rivest R. L., Introduction to Algorithms. New York: McGraw-Hill(1990).

[4] Deokar. R. B and Sapatnekar S. S “A graph-theoretic approach to clock skew optimization,” in Proc. IEEE Int. Symp. Circuits and Systems, London, U.K.(1994) , vol. 1, pp. 407–410. [5] Friedman E. G, Liu. X, and Papaefthymiou M. C, “Retiming and clock scheduling for digital circuit optimization,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 21, no. 2, pp. 184–203, Feb(2002). [6] S. M. Burns, “Performance analysis and optimization of asynchronous circuits,” Ph.D. dissertation, Dept. Comput. Sci., California Inst. Technol., Pasadena [7] John P.Fishburn, “Clock skew optimization,” IEEE Trans. Comput., vol. 39, no. 7, July( 1990). [8] Leiserson C. E and Saxe J. B “Retiming Synchronous Crcuitry”. [9] Papaefthymiou M. C, “Understanding retiming through maximum average-delay cycles,” Math. Syst. Theory, vol. 27, no. 1, pp. 65–84, Jan. /Feb (1994). [10] Shenoy N. V, Brayton R. K and Sangiovanni-Vincentelli A. L,“Clock Skew Scheduling with Delay Padding for Prescribed Skew Domains” in proc. IEEE/ACM Int. Conf. Computer-Aided Design, Santa Clara,CA (1993), pp. 156–1

NCVCCC-‘08

87

VLSI Floor Planning Based On Hybrid Particle Swarm Optimization 1D.Jackuline Moni, 2S.Arumugam

1Associate Professor,ECE Department,Karunya University 2Chief Executive,Bannari Amman Educational Trust

Abstract- Floor planning is important in very large scale integrated circuits (VLSI) design automation as it determines the performance, size and reliability of VLSI chips. This paper presents a floorplanning method based on hybrid Particle Swarm Optimization (HPSO).B*-tree floorplan structure is adopted to generate an initial floorplan without any overlap and then HPSO is applied to find out the optimal solution.HPSO has been implemented and tested on popular MCNC and GSRC benchmark problems for nonslicing and hard module VLSIfloorplanning.Experimental results show that the HPSO can quickly produce optimal or nearly optimal solutions for all popular benchmark circuits.

I.INTRODUCTION

As technology advances, design complexity is increasing and the circuit size is getting larger. To cope with the increasing design complexity, hierarchial design and IP modules are widely used. This trend makes module floorplanning much more critical to the quality of a VLSI design than ever. Given a set of circuit components, or “modules” and a net list specifying interconnections between the modules, the goal of VLSI floorplanning is to find a floorplan for the modules such that no module overlaps with another and the area of the floorplan and the interconnections between the modules are minimized. A fundamental problem to floorplanning lies in the representation of geometric relationship among modules. The representation profoundly affects the operations of modules and the complexity of a floorplan design process. It is thus desired to use an efficient, flexible, and effective representation of geometric relationship for floorplan designs. Existing floorplan representations can be classified into two categories, namely: 1)“Slicing representation” 2) “non slicing representation”. Slicing floorplans are those that can be recursively bisected by horizontal and vertical cut lines down to single blocks. They can be encoded by slicing trees or polish expressions[9,18].For non slicing floorplans, researchers have proposed several representations such as sequence pair [12,13, 16,17]., boundedslicinggrid[14], Otree[4], B*tree[2] ,TransitiveClosureGraph(TCG)[10,11],Corner-block list(CBL)[5,21],Twin binary sequence[20]. Since B*-tree representation [2] is an efficient, flexible, and effective data structure, we have used B*-tree floorplan to generate an initial floorplan without any overlap. Existing representations [6,19], use simulated annealing because this allows modifying the objective function in applications. The drawback of adopting SA is that the system must be close to equilibrium throughout the process, which demands a careful adjustment of the annealing schedule parameters.

In this paper, we adopted a non-slicing representation B*-tree with Hybrid Particle Swarm Optimization (HPSO) algorithm.HPSO [1] utilizes the basic mechanism of PSO [7, 8] and the natural selection method, which is usually utilized by EC methods such as genetic algorithm (GA).Since search procedure by PSO deeply depends on pbest and gbest, the searching area may be limited by pbest and gbest. On the contrary, by the introduction of natural selection, broader area search can be realized. The remainder of this paper is organized as follows. Section 2 describes the PSO and HPSO methodology. Section 3 presents B*-tree representation and our proposed methods for floorplanning.The experimental results are reported in Section 4.Finally, the conclusion is in section 5.

II .METHOLOGY

A. Particle Swarm Optimization Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Dr.Eberhart and Dr.Kennady in 1995, inspired by social behavior of bird flocking. In PSO, the potential solution, called particles, fly through the problem space by following the current optimum particles. All the particles have fitness values, which are evaluated by the fitness function to be optimized, and have velocities, which direct the flying of the particles. The particles fly through the problem space by following the current optimum particles. PSO is initialized with a group of random particles (solutions) and then searches for optima by updating generations. In every iteration, each particle is updated by following two “best “values Pbest and Gbest. When a particle takes part of the population in its topological neighbors, the best value is a local best and is called Lbest. Let s denotes the dimension number of unsolved problem. In general, there are three attributes, current position xi, current velocity vi and local best position yi, for particles in search space to present their features. Each particle in the swarm is iteratively updated according to the aforementioned attributes [3,7]. Each agent tries to modify its position using the following information: The current positions (x, y), current velocities (vx, vy), distance between the current position and p best and the distance between the current position and g best. Assuming that the function f is to be minimized that the dimension consists of n particles and the new velocity of every particle is updated by (1) vi,j(t+1)=wvi,j(t)+c1 r1,i(t)[yi,j(t)-xi,j(t)]+c2 r2,i(t)[i,j(t)-xi,j(t)] (1)

NCVCCC-‘08

88

where, v i,j is velocity of the particle of the jth dimension for all j belongs to 1…s, w is the inertia weight of velocity,c1 and c2 denotes the acceleration coefficients,r1 and r2 are the elements from two uniform random sequences in the range (0, 1) and t is number of generations. The new position of the particle is calculated as follows xi(t+1)=xi(t)+vi(t+1) (2) The local best position of each particle is updated by(3).

( )( ) ( )( )( ) ( )( ) ( )( )⎩

⎨⎧

<+→→+≥+→→

=+tyftxfiftx

tyftxfiftyty

iii

iiii 1,1

1),()1(

(3) The global best position y found from all particles during previous three steps are defined as

( ) ( )( ) nityfty iy

ii

≤≤→+=+ 1,1arg1 min (4)

B. Hybrid particle swarm optimization (HPSO) The structure of the hybrid model is illustrated below begin initialize while (not terminate-condition) do begin evaluate calculate new velocity vectors move Natural Selection end end The breeding is done by first determining which of the particles that should breed. This is done by iterating through all the particles and, with probability (pi) mark a given particle for breeding. Note that the fitness is not used when selecting particles for breeding. From the pool of marked particles we now select two random particles for breeding. This is done until the pool of marked particles is empty. The parent particles are replaced by their offspring particles, thereby keeping the population size fixed where pi is a uniformly distributed random value between 0 and 1.The velocity vectors of the offspring is calculated as the sum of the velocity vectors of the parents normalized to the original length of each parent velocity vector. The flow chart of HPSO is shown in figure.1

Figure1:Flow chart of HPSO

C. Steps of Hybrid Particle Swarm Optimization Step 1: Generation of initial condition of each agent .Initial searching points (si0) and velocities (vi0) of each agent are usually generated randomly within the allowable range. The current searching point is set to pbest for each agent. The best-evaluated value of pbest is set to g best and the agent number with the best value is stored. Step 2: Evaluation of searching point of each agent. The objective function value is calculated for each agent. If the value is better than the current pbest of the agent, the pbest value is replaced by the current value. If the best value of pbest is better than the current g best, g best is replaced by the best value and the agent number with the best value is stored. Step 3: Natural selection using evaluation value of each searching point is done. Step 4: Modification of each searching point. The current searching point of each agent is changed. Step 5: Checking the exit condition. If the current iteration number reaches the predetermined maximum iteration number, then exit, otherwise go to step 2.

III.B*-TREE REPRESENTATION Given an admissible placement P, we can represent it by a unique (horizontal) B*-tree T. Fig 2(b) gives an example of a B*-tree representing the placement of Fig 2(a). A B*-tree is an ordered binary tree whose root corresponds to the module on the bottom left corner. Similar to the DFS procedure, we construct the B*-tree T for an admissible placement P in a recursive fashion: Starting from the root, we first recursively construct the left subtree and then the right subtree. Let Ri denote the set of modules located on the right hand side and adjacent to bi. The left child of the node ni corresponds to the lowest module in Ri that is unvisited. The right child of the node ni represents the lowest module located above and with its x coordinates equal to that of bi. Following the above mentioned DFS procedure and definitions, we can guarantee the 1-to-1 correspondence between an admissible placement and its induced B*-tree.

NCVCCC-‘08

89

(a)

(b)

Fig2: (a) An admissible placement (b) The (horizontal) B*-tree representing the placement As shown in fig 2, it makes the module a , the root of T since module a, is on the bottom - left corner. Constructing the left subtree of na recursively it makes nh the left child of na . Since the left child of nh does not exist, it then constructs the right subtree of nh (which is routed by ni). The construction is recursively performed in the DFS order. After completing the left subtree of na

the same procedure applies to the right subtree of na. The resulting B *tree for the placement of fig 2( a) is shown in fig 2(b) .The construction takes only linear time.

Given a B* tree T , we shall compute the x and y coordinates for each module associated with a node in the tree. The x and y coordinates of the module associated with the root (xroot, yroot) = (0, 0) since the root of T represents the bottom- left module. The B* -tree keeps the geometric relationship between two modules as follows. If node nj is the left child of node ni , module bj must be located on the right- hand side and adjacent to module bi in the admissible placement ; xj = xi + wi. . Besides if node nj is the right child of node ni , module bj must be located above, with the x- coordinate of bj equal to that of bi i.e xj = xi. Therefore, given a B* -tree, the x coordinates of all modules can be determined by traversing the tree once. The contour data structure is adopted to efficiently compute the y- coordinate from a B* -tree. Over all, given a B*-tree we can determine the

corresponding packing (i.e compute the x and y coordinates for all modules) is amortized linear time. B*-tree Perturbations Given an initial B*-tree, we perturb the B*-tree to another using the following three operations.

• Op1 : Rotate a module • Op 2 : Move a module to another place • Op 3 : swap two modules •

Op 1 rotates a module, and the B * -tree structure is not changed. Op 2 deletes and inserts a node . Op 2 and Op 3 need to apply the deletion and insertion operations for deleting and inserting a node from and to a B*-tree.

A. Floorplanning using B*-tree

Read the input benchmark circuit and construct the B*-tree representation. Then start with random floorplanning to get initial solutions. These initial solutions will be assigned to different particles. Then velocity is found out for each particle. Depending upon the velocity of each particle, we have to perturb the B*-tree. After each perturbation new solution is obtained. Then gbest and lbest are found out. Natural selection using evaluation value of each searching point is done and the same process is repeated until the termination condition is reached.

IV .EXPERIMENT RESULTS The experiments in this study employed GSRC and MCNC bench marks[22] for the proposed floorplanner and compared with [2].The simulation programs were written in C++ compiled using Microsoft Visual C++,and the results were obtained on a Pentium 4 2Ghz with 256MB RAM. The PSO experiments with w, c1, c2 initializations were 0.4, 1.4, and 1.4 respectively. For HPSO, the probability of selection is chosen as 0.6.The particle number is set as twenty. The floorplanner was run for 10 times and average values of chip area and run time were taken. The results are shown in Table 1.Compared with [2], our method can find a better placement solution in even less computation time. Under the same tree structure, our approach has more efficiency and solution searching ability for floorplan. Table 1 Results of Hard Modules using B*-tree based HPSO

V .CONCLUSION AND FUTURE WORK

In this paper, we proposed a floorplanner based on HPSO with B*-tree structure for placing blocks. HPSO exhibits the ability for searching the solution space more efficiently than SA.The experimental results proved that the proposed HPSO method can lead to a more optimal and reasonable solutions on the hard IP modules placement problem. Our future work is to deal with soft IP modules and also to include constraints such as alignment and performance constraints.

circuit #of blocks

With B*-tree Area Time (mm2) (sec)

Our method Area Time (mm2) (sec)

Apte Xerox Ami33

9 10 33

46.92 20.06 1.27

7 25

3417

46.829 19.704

1.26

1.31 3.69 4.44

NCVCCC-‘08

90

REFERENCES

[1] P.J. Angeline “Using Selection to Improve Particle Swarm Optimization.” In Proceedings of the IEEE Congress on Evolutionary Computation, 1998 pages 84-89 IEEE Press. [2] Y.-C. Chang, Y.-W. Chang, G.-M. Wu and S.-W.Wu, “B *-trees: A New representation for Non-Slicing Floorplans,” DAC 2000, pp.458-463. [3]R.C.Eberhart and J.kennedy “A New Optimizer using Particle Swarm Theory.” In Proceedings of the Sixth International Symposium on Micromachine and Human Science, 1995 ,pages 39-43. [4] P.-N. Guo, C.-K. Cheng and T. Yoshimura, “An O-tree Representation of Non-Slicing Floorplan,” DAC ‘99, pp. 268-273. [5] X. Hong et al., “Corner Block List: An Effective and Efficient Topological Representation of Non-Slicing Floorplan,” ICCAD 2000, pp. 8-13. [6] A. B. Kahng, “Classical floorplanning harmful?” ISPD 2000, pp. 207-213. [7] J.Kennedy and R.C.Eberhart ‘Particle Swarm Optimization.’ In Proceedings of the IEEE International Joint Conference on Neural Networks, (1995) pages 1942-1948.IEEE Press [8] J.Kennedy ‘The Particle Swarm: Social Adaptation of Knowledge.’ In Proceedings of the IEEE International Conference on Evolutionary Computation, 1997, pages 303-308. [9]M.Lai and D. Wong,“SlicingTree Is a Complete FloorplanRepresentation,” DATE 2001, pp. 228–232. [10] J.-M. Lin and Y.-W Chang, “TCG: A Transitive Closure Graph-Based Representation for Non-Slicing Floorplans,” DAC 2001, pp. 764–769. [11] J.-M. Lin and Y.-W. Chang, “TCG-S: Orthogonal Coupling of P*-admissible Representations for General Floorplans,” DAC 2002, pp. 842–847. [12] H. Murata, K. Fujiyoshi, S. Nakatake and, “VLSI Module Placement Based on Rectangle-Packing by the Sequence Pair,” IEEE Trans. on CAD 15(12), pp. 1518-1524, 1996. [13] H. Murata and E. S. Kuh, “Sequence-Pair Based Placement Methods for Hard/Soft/Pre-placed Modules”, ISPD 1998, pp. 167-172. [14] S.Nakatake, K.Fujiyoshi, H.Murata,and Y.Kajitani, “Module placement on BSG structure and IC Layout Applications,” Proc.ICCAD,pp.484-491,1998. [15] K.E. Parsopoulos and M.N. Vrahatis, “Recent Approaches to Global ptimization Problems through Particle Swarm Optimization.” Natural Computing, 2002, 1(2-3):235-306. [16] X. Tang, R. Tian and D. F. Wong, “Fast Evaluation of Sequence Pair in Block Placement by Longest Common Subsequence Computation,” DATE 2000, pp. 106-111. [17] X. Tang and D. F.Wong, “FAST-SP: A Fast Algorithm for Block Placement Based on Sequence Pair,” ASPDAC 2001, pp. 521-526. [18] D.F.Wong and C.L.Liu, “A New Algorithm For Floorplan Design,” DAC 1986,PP.101-107. [19] B. Yao et al., “Floorplan Representations: Complexity and Connections,” ACM Trans. on Design Autom. of Electronic Systems 8(1), pp. 55–80, 2003.

[20] E.F.Y. Young, C.C.N. Chu and Z.C. Shen, “Twin Binary Sequences: A Nonredundant Representation for General Nonslicing Floorplan,” IEEE Trans. on CAD 22(4), pp. 457–469, 2003. [21] S. Zhou, S. Dong, C.-K. Cheng and J. Gu, “ECBL: An Extended Corner Block List with Solution Space including Optimum Placement,” ISPD 2001, pp. 150-155. [22] http://www.cse.ucsc.edu/research/surf/GSRC/progress.html

NCVCCC-‘08

91

Development Of An EDA Tool For Configuration Management Of FPGA Designs

Anju M I 1, F. Agi Lydia Prizzi 2, K.T. Oommen Tharakan3

1PG Scholar, School of Electrical Sciences, Karunya University, Karunya Nagar, Coimbatore - 641 114

2Lecturer, School of Electrical Sciences, karunya University, Karunya Nagar, Coimbatore - 641114

3Manager-IED, Avionics, VSSC, ISRO P.O. Thiruvanathapuram

Abstract-To develop an EDA tool for configuration management of various FPGA designs. Once FPGA designs evolve with respect to additional functionality, design races etc. it has become very important to use the right design only for the application. In this project we propose to solve the problem with respect to the case of VHDL. The FPGA VHDL codes will be coded for the various constructs, the no. of pins used, the pin check sum, the fuse check sum and the manufacturer, the design, the device part number and the device resources used. This will help in fusing the right VHDL file to the FPGA.

I.INTRODUCTION a) EDA tools:

Electronic design automation (EDA) is the category of tools for designing and producing electronic systems ranging from printed circuit boards (PCBs) to integrated circuits. This is sometimes referred to as ECAD (electronic computer-aided design) or just CAD. This usage probably originates in the IEEE Design Automation Technical Committee. EDA for electronics has rapidly increased in importance with the continuous scaling of semiconductor technology. EDA tools are used for programming design functionality into FPGAs.

b) Configuration Management:

Configuration Management (CM) is a documentation system for tracking the work. Configuration management involves the collection and maintenance of data concerning the hardware and software of computer systems that are being used. CM embodies a set of techniques used to help define, communicate and control the evolution of a product or system through its concept, development, implementation and maintenance phases. It is also a set of systematic controls to keep information up to date and accurate. A collection of hardware, software, and/or firmware, which satisfies an end-use function and is designated for configuration management.

Configuration management is a discipline applying technical and administrative direction and surveillance to identify and document the functional and physical characteristics of a configuration item, control changes to those characteristics, record and report change processing and implementation status, and verify compliance with specified requirements. (IEEE STD 610.12, 1990).

The IEEE's definition has three keywords: technical, administrative and surveillance. This definition fits the CM concept into the organization. CM not only can help the technical staff to track their work, but also can help the administrator to create a clear view of the target, the problem and the current status. Furthermore, CM supplies an assessment framework to track the whole progress.

c) Checksum:

It is important to be able to verify the correctness of files that are moved between different computing systems. The way that this is traditionally handled is to compute a number which depends in some clever way on all of the characters in the file, and which will change, with high probability, if any character in the file is changed. Such a number is called as checksum.

This paper presents Development of an EDA Tool For The Configuration Management Of FPGA Design, a tool which converts VHDL code to a unique code. This tool is implemented in C language by assigning values to the some of the constructs present in the HDL. It is very important to use the right design only for the application. This tool will help in fusing the right VHDL file to the FPGA. The conversion of VHDL file to a unique code is done by assigning values to the some of HDL constructs. The code or the number thus obtained will be unique. For developing this tool we are considering not only HDL constructs (IEEE 1164 logic) but also file name, fuse check sum and pin check sum.

NCVCCC-‘08

92

II.BLOCK DIAGRAM

III. OVERVIEW

For a totally hardware oriented design (eg. FPGAs) the development time is prohibitive in bringing fresh and affordable products to the market. Equally restrictive is a totally software based solution which will perform slowly due to the use of a generalised computing. This is where designing for a hybrid between a hardware and software based implementation can be of particular advantage. This paper helps you in developing an EDA tool for configuration management of FPGA designs. This tool will help you in selecting the right file for the application. In many ways (data flow, structural, behavioural and mixed level modelling) we can write a program. What ever may be the modelling we should download the right application to the FPGA kit. Otherwise it leads to wastage of time, money and energy. For developing this tool, it has considered some the constructs present in VHDL. And assigning weights to the constructs which are considered. Here we have considered .vhd file (after writing VHDL program, we save the file with an extension .vhd) as the input and a unique code or a unique number as the output. With the help of C language the .vhd file is converted to a unique code or a number. If a file is converted into a number and saved, it will be helpful while coding modules of big project. Consider an example: if we want to do coding of a big project and in one of its modules a CPU or a RAM has come. In such situation, the programmer has to do coding for the CPU or RAM, depends on the need. The programmer will directly copy and use that code, if the code is available. To make it as error free, he has to check whether the coding is working or not. This is time consuming. If it is coded and saved as a number, it will be easier for the programmer to call that particular program for his project. He could just call that code for downloading to FPGA kit. In this paper we have considered five different VHDL programs (.vhd files). These files are converted into unique code. It is shown in output. EDA TOOL DEVELOPMENT: Steps taken into consideration The following steps are taken into consideration.

1. The various constructs used in VHDL. (Quantity and location of the constructs).

2. Develop a unique number for the above, giving weights to various constructs.

3. This shall be mathematically or logically operated with respective FPGA manufacturer. Eg. Actel.

4. Further it shall again be operated with respect to the Actel actual device number.

There are a total of 97 constructs in HDL( IEEE 1164 logic). For some of its constructs, here we have assigned different values. The assigned values may be decimal, hexadecimal or octal. Algorithm used for this is given below:

IV. ALGORITHM Step 1: begin Step 2: read .vhd file Step 3: assign values for constructs Step 4: weighted file name * [ weighted construct * position of the construct] = a number // for a single construct// Step 5: total no. of similar constructs + step 4 = a number Step 6: repeat step 4 and step 5 // for all constructs// Step 7: product no. * step 6 = a number Step 8: product version no. + step 7 = a number Step 9: [fuse checksum +pin check sum] + step 8 = a number 4. a) STEP DETAILS INPUT: .vhd file Step 3: file name==> assign a value For eg: wt assigned for the file = 1 Step 4: wtd file name * [ wtd construct * line no.] = a number or a code (this is for a single construct) For eg: case statement wt assigned for case ==> 8 case is in line no. 30 then, 1*[8*30] = 240 (this is for a single construct) Step 5: add similar constructs For eg: total no. of case statement = 90 then, add single construct value and total no. ie: 240+90 = 330 Step 6: repeat steps 5 and 4 for all constructs For eg: ‘ if’ statement wt assigned for ‘if’ = 10

suppose ‘if’ is in line no. 45 then, 1* [10*45] = 450 total no. of if statement =1 construct value+total no = 450 +15 =465 so step 6= 330+465 = 795 Step 7: 795 * product no For eg: Product no = 77 Then, 795 * 77 = 61215 Step 8: 61215 + version no For eg: version no= 3 61215 + 3 =61218

NCVCCC-‘08

93

Step 9: pin checksum + fuse checksum +61215 OUT PUT: a code or a number

V. OUTPUTS:

1) 16 bit adder

2) ALU

3) 8 bit counter

4) Fibonacci series

5) 32 bit memory module

VI. CONCLUSION

This report presents the various steps required for the implementation of development of an EDA tool for configuration management of FPGA design and is presented in the form of an algorithm. The coding for the implementation of development of an EDA tool for configuration management of FPGA design has been implemented using C and the experimental results of five .vhd files are shown above.

NCVCCC-‘08

94

REFERENCES

[1] Ecker, W.; Heuchling, M.; Mades, J.; Schneider, C.; Schneider, T.; Windisch, A.; Ke Yang; Zarabaldi,‘‘HDL2HYPER-a highly flexible hypertext generator for VHDL models’’ , IEEE, oct 1999, Page(s):57 – 62. [2] ‘‘The verilog to html converter’’, www.burbleland.com/v2html/v2html.html VII. [3] ‘‘QUARTUS II TCL EXAMPLE: INCREMENT

VERSION NUMBER IN FILE’’, HTTP://WWW.ALTERA.COM/ [4] ‘‘Method and program product for protecting information in EDA tool design views’’, http://www.freepatentsonline.com/20070124717.html [5]‘‘Intusoft makes HDL development model easy’’ http://www.intusoft.com/ [6] Arie Komarnitzky, Nadav Ben-Ezer, Eugene Lyubinsky - AAI (Avnet ASIC Israel Ltd.) Tel Mond, Israel , ‘‘Unique Approach to Verification of Complex SoC Designs’’. [7] Matthew F. Parkinson, Paul M. Taylor, and Sri Parameswaran, “C to VHDL Converter in a Codesign Environment”, VHDL International Users Forum. Spring Conference, 1994 .

NCVCCC-‘08

95

A BIST for Low Power Dissipation Mr.Rohit Lorenzo ,PG Scholar & Mr.Amir Anton jone, M.E, Lecturer ECE Dept. KarunyaUniversity,Coimbatore

Abstract - In this paper we propose a new scheme for Built in self test .we proposing different architectures that have reduce power dissipation . The architectures are designed With techniques that reducing the power dissipation. The BIST with different technique decreases transitions that occur at scan inputs during scan shift operations and hence reduces power dissipation in the CUT. Here we doing the comparison among different architectures of BIST. In this paper We are fixing the values at the inputs of BIST architecture & at the output we are restructuring the scan chain to get the optimized results. Experimental results of the proposed technique show that the power dissipation is reduced signifcantly compared to existing work.

I.INTRODUCTION

Circuit power dissipation in test mode is much higher than the power dissipation in function mode [21]. High power consumption in BIST mode is especially a serious concern because of at-speed testing. Low power BIST techniques are gaining attention in recent publications [11]. The first advantage of low power BIST is to avoid the risk of damaging the Circuits Under Test (CUT). Low power BIST techniques save the cost of expensive packages or external cooling devices for testing. Power dissipation in BIST mode is made up of three major components: the combinational logic power, the sequential circuit power, and the clock power. In the clock power reduction category, disabling or gating the clock of scan chains are proposed [2]. By modifying the clock tree design, these techniques effectively reduce the clock power consumption, which is shown to be a significant component of the test power [23]. However, clock trees are sensitive to the change of timing; even small modifications sometimes can cause serious failure of the whole chip. Modifying the clocks, therefore, not only increases the risk of skew problems but also imposes constraints on the test patterns generation. The low transition random test pattern generator (LT-RTPG) is proposed to reduce the number of toggles of the scan input patterns . In 3-weight weighted random technique we are fixing transition at the input so in this way we are reducing power in 3 wt wrbist. switching activity in a circuit can be significantly higher during BIST than that during its normal operation. Finite-state machines are often implemented in such a manner that vectors representing successive states are highly correlated to reduce power dissipation [16]. Use of scan allows to

apply patterns that cannot appear during normal operation to the state inputs of the CUT during test application. Furthermore, the values applied at the state inputs of the CUT during scan shift operations represent shifted values of test vectors and circuit responses and have no particular temporal correlation. Excessive switching activity due to low correlation between consecutive test patterns can cause several problems [14].Since heat dissipation in a CMOS circuit is proportional to switching activity, a CUT can be permanently damaged due to excessive heat dissipation if switching activity in the circuit during test application is much higher than that during its normal operation. Heat dissipated during test application is already in-fluencing the design of test methodologies for practical circuits [14].

II-MINIMIZING POWER DISSIPATION BY REDUCING SWITCHING ACTIVITY

The BIST TPG proposed in this paper reduces switching activity in the CUT during BIST by reducing the number of transitions at scan input during scan shift cycles fig 1.. If scan input is assigned , where , at time and assigned the opposite value at time , then a transition occurs at at time . The transition that occurs at scan input can propagate into internal circuit lines causing more transitions. During scan shift cycles, the response to the previous scan test pattern is also scanned out of the scan chain. Hence, transitions at scan inputs can be caused by both test patterns and responses. Since it is very difficult to generate test patterns by a random pattern generator that cause minimal number of transitions while they are scanned into the scan chain and whose responses also cause minimal number of transitions while they are scanned out of the scan chain, we focus on minimizing the number of transitions caused only by test patterns that are scanned in. Even though we focus on minimizing the number of transitions caused only by test patterns, our extensive experiments show that the proposed TPG can still reduce switching activity significantly during BIST . Since circuit responses typically have higher correlation among neighborhood scan outputs than test patterns, responses cause fewer transitions than test patterns while being scanned out. A transition at the input of the scan chain at scan shift cycle , which is caused by scanning in a value that is opposite to the value that was scanned in at the previous scan shift cycle , continuously causes transitions at scan inputs while the value travels through the scan chain for the following scan shift cycles.. describes scanning a scan test pattern 01100 into a scan chain that has five scan flip-flops. Since a 0 is scanned into the scan chain at time , the 1 that is scanned into the scan chain at time causes a transition at the input of the scan chain and continuously causes transitions at the scan flip-flops it passes through until it arrives at its final destination at time .

NCVCCC-‘08

96

In contrast, the 1 that is scanned into the scan chain at the next cycle causes no transition at the input of the scan chain and arrives at its final destination at time without causing any transition at the scan flip-flops it passes through[14]. This shows that transitions that occur in the entire scan chain can be reduced by reducing transitions at the input of the scan chain. Since transitions at scan inputs propagate into internal circuit lines causing more transitions, reducing transitions at theinput scan chain can eventually reduce switching activity in the entire circuit.

Fig1 -Transitions at scan chain input

III. ARCHITECTURE OF 3WT-WRBIST

(a) (b)

Fig. 2.generator: (a) with toggle flip-flops TF and TF

and (b)without toggle flip-flops.

Fig. 3wt-WRBIST shows a set of generators and Fig.3 shows an implementation of the 3-weight WRBIST for the generators shown The shift counter is an (m+1) modulo counter, where m is the number of scan elements in the scan chain

(since the generators are 9 bits wide, When the content of the shift counter is , where k = 0,1,……8, A value for input pk is scannes into the scan chain The generator counter selects appropriate generators; when the content of the generator counter is , test patterns are generated by using generator Pseudo-random pattern sequences generated by an LFSR are modified (fixed) by controlling the AND and OR gates with overriding signal s0 and s1 . fixing a random value to 0 is achieved by setting s0 to 1 and s1 to 1. overriding of signals s0 and s1 is driven by T flip flops , TF0 and TF1 . The inputs of TF0 and TF1 is driven by D0 and D1 respectively which are generated by the outputs of shift counter and generator counter . The shift counter is required by all scan-based BIST techniques and not particular to the proposed 3-weight WRBIST scheme.All BIST controllers need a pattern counter that counts the number of test patterns applied. The generator counter can be implemented from logG where G is the number of generator counter no additional hardware is required hardware overhead for implementing a 3-weight WRBIST is incurred only by the decoding logic and the fixing logic, which includes two toggle flip-flops ( flip-flops), an AND and an OR gate. Since the fixing logic can be implemented with very little hardware, overall hardware overhead for implementing the serial fixing 3-weight WRBIST is determined by hardware overhead for the decoding logic. both d0 and d1 are set to 0 hence the t flip flops hold totheirprevious state in cycles when a scan value of Pk is scanned in also assume that T flip flop TF0 is initialized to 1 TF1 initialized to 0 . flip flops placed in scan chain in descending order of their subscript number hence the value of p0 is scanned first and p8 is scanned last Random patterns generated by the LFSR can be fixed by controlling the AND/OR gates directly by the decoding logic without the two T flip flops . however this scheme incur larger hardware overhead for the decoding logic and also more transition in the circuit under test (CUT) during BIST than the scheme with T flip flops . in the scheme shows TF0 ,TF1, D0 and D1 values for the scheme in T flip flops that is implemented .

IV-ARCHITECTURE OF LT-RTPG BIST The LT-RTPG proposed in reduces switching activity during BIST by reducing transitions at scan inputs during scan shift operations. An example LT-RTPG is shown in Fig. 4. The LT-RTPG is comprised of an -stage LFSR, a -input AND gate, and a toggle flip-flop (T flip-flop). Hence, it can be implemented with very little hardware. Each of inputs of the AND gate is connected to either a normal or an inverting output of the LFSR stages. If large is used, large sets of neighboring state inputs will be assigned identical values in most test patterns, resulting in the decrease fault coverage or the increase in test sequence length. Hence, like [15], in this paper, LT-RTPGs with only or 3 are used. Since a flip-flop holds previous values until the input of the flip-flop is assigned a 1, the same value , where , is repeatedly scanned into the scan chain until the value at the output of the AND gate becomes 1. Hence, adjacent scan flip-flops are assigned identical values in most test patterns and scan inputs have fewer transitions

NCVCCC-‘08

97

during scan shift operations. Since most switching activity during scan BIST occurs during scan shift operations (a capture cycle occurs at every cycles), the LT-RTPG can reduce heat dissipation during overall scan testing. Various properties of the LT-RTPG are studied and a detailed methodology for its design is presented in . It has been observed that many faults that escape random patterns are highly correlated with each other and can be detected by continuously complementing values of a few inputs from a parent test vector. This observation is exploited in [22], and to improve fault coverage for circuits that have large numbers of RPRFs. We have also observed that tests for faults that escape LT-RTPG test sequences share many common input

Fig4 LT-RTPG

assignments. This implies that RPRFs that escape LT-RTPG test sequences can be effectively detected by fixing selected inputs to binary values specified in deterministic test cubes for these RPRFs and applying random patterns to the rest of inputs. This technique is used in the 3-weight WRBIST to achieve high fault coverage for random pattern resistant circuits. In this paper we demonstrate that augmenting the LT-RTPG with the serial fixing 3-weight WRBIST proposed in [15] can attain high fault coverage without excessive switching activity or large area overhead even for circuits that have large numbers of RPRFs.

V.CONCLUSION

This paper presents a low hardware overhead TPG for scanbased BIST that can reduce switching activity in CUTs during BIST . The main objective of most recent BIST techniques has been the design of TPGs that achieve Low power dissipation . Since the correlation between consecutive patterns applied to a circuit during BIST is significantly lower, switching activity in the circuit can be significantly higher during BIST than that during its normal operation.

REFRENCES [1] Z. Barzilai, D. Coppersmith, and A. L. Rosenberg, “Exhaustive genera-tion of bit patterns with applications to VLSI self-testing,’’ IEEE Trans.Comput., vol. C-32, no. 2, pp. 190-194, Feb. 1983. [2] L. T. Wang and E. J. McCluskey, “Circuits for pseudo-exhaustive testpattern generation,” in Proc. IEEE Inr. Tesr Con$, 1986, pp. 25-37. [3] W. Daehn and J. Mucha, “Hardware test pattern generators for built-in test,’’ in Proc. IEEE Int. Tesr Con$, 1981, pp. 110-113. [4] S. Hellebrand, S. Tarnick, and J. Rajski, “Generation of vector patterns through reseeding of multiple-polynomial linear feedback shift registers,”in Proc. IEEE Int. Test Conf., 1992, pp. 120–129. [5] N. A. Touba and E. J. McCluskey, “Altering a pseudo-random bit sequence for scan-based BIST,” in Proc. IEEE Int. Test Conf., 1996, pp.167–175. [6] M. Chatterjee and D. K. Pradhan, “A new pattern biasing technique for BIST,” in Proc. VLSITS, 1995, pp. 417–425. 7] N. Tamarapalli and J. Rajski, “Constructive multi-phase test point insertion for scan-based BIST,” in Proc. IEEE Int. Test Conf., 1996, pp. 649–658. [8] Y. Savaria, B. Lague, and B. Kaminska, “A pragmatic approach to the design of self-testing circuits,” in Proc. IEEE Int. Test Conf., 1989, pp. 745–754. [9] J. Hartmann and G. Kemnitz, “How to do weighted random testing for BIST,” in Proc. IEEE Int. Conf. Comput.-Aided Design, 1993, pp.568–571. [ [10] J. Waicukauski, E. Lindbloom, E. Eichelberger, and O. Forlenza, “A method for generating weighted random test patterns,” IEEE Trans. Comput., vol. 33, no. 2, pp. 149–161, Mar. 1989. [11] H.-C. Tsai, K.-T. Cheng, C.-J. Lin, and S. Bhawmik, “Efficient testpoint selection for scan-based BIST,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 6, no. 4, pp. 667–676, Dec. 1998. [12] W. Li, C. Yu, S. M. Reddy, and I. Pomeranz, “A scan BIST generation method using a markov source and partial BIST bit-fixing,” in Proc.IEEE-ACM Design Autom. Conf., 2003, pp. 554–559. [13] N. Z. Basturkmen, S. M. Reddy, and I. Pomeranz, “Pseudo random patterns using markov sources for scan BIST,” in Proc. IEEE Int. Test Conf., 2002, pp. 1013–1021. [14] S. B. Akers, C. Joseph, and B. Krishnamurthy, “On the role of independent fault sets in the generation of minimal test sets,” in Proc. IEEE Int Test Conf., 1987, pp. 1100–1107. [15] S. W. Golomb, Shift Register Sequences. Laguna Hills, CA: Aegean Park, 1982. [16] C.-Y. Tsui, M. Pedram, C.-A. Chen, and A. M. Despain, “Low power state assignment targeting two-and multi-level logic implementation,” in Proc. IEEE Int. Conf. Comput.-Aided Des., 1994, pp. 82–87 [17] P. Girard, L. Guiller, C. Landrault, andS.Pravossoudovitch, “A test vector inhibiting technique

NCVCCC-‘08

98

for low energy BIST design,” in Proc. VLSI Test. Symp., 1999, pp. 407–412. [18] J. A. Waicukauski, E. B. Eichelberger, D. O. Forlenza, E. Lindbloom,and T. McCarthy, “Fault simulation for structured VLSI,” VLSI Syst. Design, pp. 20–32, Dec. 1985 [19] R. M. Chou, K. K. Saluja, and V. D. Agrawal, “Scheduling tests for VLSI systems under power constraints,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 5, no. 2, pp. 175–185, Jun. 1997. [20] T. Schuele and A. P. Stroele, “Test scheduling for minimal energy consumption under power constrainits,” in Proc. VLSI Test. Symp., 2001,pp. 312–318. [21] N. H. E.Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective, 2nd ed. Reading, MA: Addison-Wesley, 1992.

[22] B. Pouya and A. L. Crouch, “Optimization trade-offs for vector volume and test power,” Proc. Int’l Tset Conf., 2000, pp. 873881. [23] Y Bonhomme, P. Girard, L. Guiller, C. Landrault and S. Pravossoudovitvh, “A gated clock scheme for low power scan testing of logic ICs or embedded cores,” Proc. 10th Asian Test Symp., 2001, pp. 253258. [24] Y. Zorian, “A distributed BIST control scheme for complex VLSI design,” Proc. 11th IEEE VLSI Test Symp., 1993, pp. 4 [25] P. Girard, “Survey of low-power testing of VLSI circuits,” IEEE Design and Test of Computers,May-June 2002, pp. 8292.

NCVCCC-‘08

99

Test pattern generation for power reduction using BIST architecture Anu Merya Philip, II M.E.(VLSI).

D.S Shylu, M.Tech, Sr Lecturer,ECE Dept:, Karunya University, Coimbatore, Tamil Nadu

Abstract--Advances in the Built-in-self-test (BIST) techniques have enabled IC testing using a combination of external automated test equipment and BIST controller on the chip. A new low power test pattern generator using a linear feedback shift register (LFSR), called LP-TPG, is presented to reduce the average and peak power of a circuit during test. The correlation between the test patterns generated by LP-TPG is more than conventional LFSR. LP-TPG inserts intermediate patterns between the random patterns. The goal of having intermediate patterns is to reduce the transitional activities of primary inputs which eventually reduces the switching activities inside the circuit under test, and hence, power consumption. The random nature of the test patterns is kept intact. Keyword—Lp-LFSR, R-injection, test patterns

I. INTRODUCTION

The Linear Feedback Shift Register (LFSR) is commonly

used as a test pattern generator in low overhead built-in-self-test (BIST). This is due to the fact that an LFSR can be built with little area overhead and used not only as a TPG, which attains high fault coverage for a large class of circuits, but also as an output response analyzer. An LFSR TPG requires unacceptably long test sequence to attain high fault coverage for circuits that have a large number of random pattern resistant faults. The main objective of most recent BIST techniques has been the design of TPG’s that achieve high fault coverage at acceptable test lengths. Another objective is to reduce the heat dissipation during test application.

A significant correlation exists between consecutive vectors applied to a circuit during its normal operation. This fact has been motivating several architectural concepts, such as cache memories and also for high speed circuits that process digital audio and video signals. In contrast, the consecutive vectors of a sequence generated by an LFSR are proven to have low power correlation. Since the correlation between consecutive test vectors applied to a circuit during BIST is significantly lower, the switching activity in the circuit can be significantly higher during BIST than that during its normal operation.

Excessive switching activity during test can cause several problems. Firstly, since heat dissipation in a CMOS circuit is proportional to switching activity, a circuit under test (CUT) can be permanently damaged due to excessive heat dissipation if the switching activity in the circuit during test application is much higher than that during its normal operation. The seriousness of excessive heat dissipation during test application is worsened by trends such as circuit miniaturization for portability and high performance. These objectives are typically achieved by using circuit designs that decrease power dissipation and reducing the package size to aggressively match the average heat dissipation during the

circuit’s normal operation. In order to ensure non-destructive testing of such a circuit, it is necessary to either apply test vectors which cause a switching activity that is comparable to that during normal circuit operation or remove any excessive heat generated during test using special cooling equipment. The use of special cooling equipment to remove excessive heat dissipated during test application becomes increasingly difficult and costly as tests are applied at higher levels of circuit integration, such as BIST at board and system levels. Elevated temperature and current density caused by excessive switching activity during test application will severely decrease the reliability of circuits under test due to metal migration or electro-migration.

In the past, the tests were typically applied at rates much lower than a circuit’s normal clock rate. Circuits are now tested at higher clock rates, possibly at the circuit’s normal clock rate (at- speed testing). Consequently, heat dissipation during test application is on the rise and is fast becoming a problem. A new low power test pattern generator using a linear feedback shift register, called LP-TPG, is presented to reduce the power consumption of a circuit during test. The original patterns are generated by an LFSR and the proposed technique generates and inserts intermediate patterns between each pair patterns to reduce the primary input’s (PI’s) activities.

II. LOW POWER TEST PATTERN GENERATION The basic idea behind low power BIST is to reduce the PI

activities. Here we propose a new test pattern generation technique which generates three intermediate test patterns between each two consecutive random patterns generated by a conventional LFSR. The proposed test pattern generation method does not decrease the random nature of the test patterns. This technique reduces the PI’s activities and eventually the switching activities in the CUT.

Assume that Ti and Ti+1 are the two consecutive test patterns generated by a pseudorandom pattern generator. Suppose the two vectors are

Ti = t1i , t2

i,…,tni and

Ti+1 = t1i+1, t2

i+1,…,t ni+1,

where n is the number of bits in the test patterns which is equal to the number of PI’s in the circuit under test.

Assume that Tk1, Tk2, and Tk3 are the intermediate patterns between Ti and Ti+1. Tk2 is generated as

Tk2 = t1i,…, tn/2

i,tn/2+1i+1,…,tn

i+1 Tk2 is generated using one half of each of the two random

patterns Ti and Ti+1. Tk2 is also a random pattern because it is generated using two random patterns. The other two patterns are generated using Tk2. Tk1 is generated between Ti and Tk2 and Tk3 is generated between Tk2 and Ti+1.

Tk1 is obtained by tj

k1 = tji; if tj

i =tjk2

R if tjitj

k2

NCVCCC-‘08

100

where j 1,2,…,n and R is a random bit. This method of generating Tk1 and Tk3 is called R-injection. If two corresponding bits in Ti and Ti+1 are the same, the same bit is positioned in the corresponding bit of Tk1, otherwise a random bit (R ) is positioned. R can come from the output of the random generator. In this method, the sum of the PI’s activities between Ti and Tk1 (Ntrans

i.,k1), Tk1 and Tk2 (Ntransk1,k2), Tk2 and

Tk3 (Ntransk2,k3) and Tk3 and Ti+1 (Ntrans

k3,i+1) are equal to the activities between Ti and Ti+1 (Ntrans

i,i+1). Ntrans

i.,k1+Ntransk1,k2+Ntrans

k2,k3 +Ntransk3,i+1 =Ntrans

i,i+1

III. LP-TPG The proposed technique is designed into LFSR

architecture to create LP-TPG. Figure 2 shows LP-TPG with added circuitry to generate intermediate test patterns.

Fig 1. Proposed LP-TPG

The LFSR used in LP-TPG is an external-XOR LFSR. The R-injection circuit taps the present state (Ti pattern) and the next state (Ti+1 pattern) of the LFSR. The R-injection circuit includes one AND, one OR and one 2x1 mux. When tj

i

and tji+1 are equal, both AND and Or gates generate the same

bit and regardless of R, that bit is transferred to the MUX output. When they are not equal, a random bit R is sent to the output.

The LP-TPG is activated by two non-overlapping enable signals: en1 and en2. Each enable signal activates one half of the LFSR.

When en1en2=10, first half of the LFSR is active and second half is in idle mode.

When en1en2=01, first half is in idle mode and second half is in active mode.

The middle flip flop between n/2th and n/2+1th flip flops is used to store the n/2th bit of the LFSR when en1en2=10 and that bit is used for second half when en1en2=01.A small finite state machine (FSM) controls the pattern generation process. Step 1: en1en2=10, sel1sel2=11

The first half of LFSR is active and second half is in idle mode. Selecting sel1sel2=11, both halves of LFSR are sent to the outputs O1 to On. Here Ti is generated. Step 2: en1en2=00, sel1sel2=10

Both halves of LFSR are in idle mode. The first half of the LFSR is sent to the outputs O1 to On/2, but the injector circuit outputs are sent to the outputs On/2+1 to On. Tk1 is generated. Step 3: en1en2=01, sel1sel2=11

The second half of the LFSR is active and the first half is in idle mode. Both halves are transferred to the outputs O1 to On and Tk2 is generated. Step 4: en1en2=00, sel1sel2=01

Both halves of the LFSR are in idle mode. From the first half, the injector outputs are sent to the outputs O1 to On/2 and the second half sends the exact bits in the LFSR to the outputs On/2+1 to On. Thus Tk3 is generated. Step 5:

The process continues by going through step 1 to generate Ti+1.

The LP-TPG with R-injection circuit keeps the random nature of the test patterns intact. The FSM control the test pattern generation throughout the steps and it is independent of the LFSR size and polynomial. Clk and test_en signals are the inputs of the FSM.

When test_en=1, FSM starts with step 1 by setting en1en2=10 and sel1sel2=11. It continues the process by going through steps 1 to 4. One pattern is generated in each clock cycle. The size of the FSM is very small and fixed. FSM can be part of BIST controller used in the circuit to control the test process.

IV. EXAMPLE OF AN 8-bit LP-TPG

The figure 2 shows an example pattern generation of 8-bit test patterns between T1 and T2 assuming R=0.

Pattern 1: T1

Intermediate patterns Tk1 Tk2

Tk3

Pattern2:

T2

1 0 1 0 0 0 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1

Fig 2. 8-bit pattern generation

NCVCCC-‘08

101

The example shows an LP-TPG using an 8-bit LFSR with polynomial x8+x+1 and seed=01001011. Two consecutive patterns T1 and T2 and three intermediate patterns are generated.

First and second halves of Tk2 are equal to T1 and T2

respectively. Tk1 and Tk2are generated using R-injection (R=0 injected in the corresponding bits of Tk1 and Tk2). Ntrans

1,2=7, Ntrans1.,k1=2, Ntrans

k1,k2=1, Ntransk2,k3=2, Ntrans

k3,2 =2. This reduction of PI’s activities reduces the switching

activities inside the circuit and eventually power consumption. Having three intermediate patterns between each consecutive pattern may seem to prolong the test session by a factor of 3. However, empirically many of the intermediate patterns can do as good as conventional LFSR patterns in terms of fault detection.

Fig 3. Block diagram of 8-bit LP-TPG. Figure 3 shows the block diagram of an LP-TPG using an 8-bit LFSR.

V. POWER ANALYSIS

The power comparison between a conventional LFSR and a low power LP-TPG is performed. The power report is obtained during simulation as below. A. Power report of conventional LFSR ----------------------------------------- Release 6.3i - XPower SoftwareVersion:G.35 Copyright (c) 1995-2004 Xilinx, Inc. All rights reserved. Design: lfsr4_9 Preferences: lfsr4_9.pcf Part: 2s15cs144-6 Data version: PRELIMINARY,v1.0,07-31-02 Power summary: I(mA) P(mW) -------------------------------------------

Total estimated power consumption: 9 --- Vccint 2.50V: 1 3 Vcco33 3.30V: 2 7 --- Clocks: 0 0 Inputs: 0 0 Logic: 0 0 Outputs: Vcco33 0 0 Signals: 0 0 --- Quiescent Vccint 2.50V: 1 3 Vcco33 3.30V: 2 7 Thermal summary: ------------------------------------------ Estimated junction temperature: 25C Ambient temp: 25C Case temp: 25C Theta J-A: 34C/W Decoupling Network Summary:Cap Range (uF) # ------------------------------------------ Capacitor Recommendations: Total for Vccint : 8 470.0 - 1000.0 : 1 0.0470 - 0.2200 : 1 0.0100 - 0.0470 : 2 0.0010 - 0.0047 : 4 --- Total for Vcco33 : 1 470.0 - 1000.0 : 1 Analysis completed: Fri Jan 25 11:01:38 2008

The power report shows that a conventional LFSR will exhibit a total power consumption of 9 mW. B. Power report of low power LP-TPG Release 6.3i - XPower SoftwareVersion:G.35 Copyright (c) 1995-2004 Xilinx, Inc. All rights reserved. Design: lp_lfsr Preferences: lp_lfsr.pcf Part: 2s15cs144-6 Data version: PRELIMINARY,v1.0,07-31-02 Power summary: I(mA) P(mW) ----------------------------------------- Total estimated power consumption: 7 --- Vccint 2.50V: 0 0

NCVCCC-‘08

102

Vcco33 3.30V: 2 7 --- Clocks: 0 0 Inputs: 0 0 Logic: 0 0 Outputs: Vcco33 0 0 Signals: 0 0 --- Quiescent Vcco33 3.30V: 2 7 Thermal summary: ----------------------------------------- Estimated junction temperature: 25C Ambient temp: 25C Case temp: 25C Theta J-A: 34C/W Decoupling Network Summary:Cap Range (uF) # ----------------------------------------- Capacitor Recommendations: Total for Vccint : 8 470.0 - 1000.0 : 1 0.0470 - 0.2200 : 1 0.0100 - 0.0470 : 2 0.0010 - 0.0047 : 4 --- Total for Vcco33 : 3 470.0 - 1000.0 : 1 0.0010 - 0.0047 : 2 Analysis completed: Fri Jan 25 11:00:00 2008

The power report of a low power LP-TPG shows a total power consumption of 7 mW. This shows that there has been much reduction in power in an LP-TPG compared to a normal LFSR.

VI. RESULTS The LP-LFSR was simulated using Xilinx software. The

conventional LFSR generated a total power of 9mW whereas the LP-TPG has a much reduced power of 7mW. The output waveform is shown in figure 4.

Fig .4. Waveform of LP-LFS

VII. PAPER OUTLINE

The proposed technique reduces the correlation between the test patterns. Original patterns are generated by an LFSR and the proposed technique generates and inserts intermediate patterns between each pair patterns to reduce the primary inputs (PI’s) activities which reduces the switching activity inside the CUT and hence the power consumption. Adding test patterns does not prolong the overall test length. Hence application time is still same. The technique of R-injection is embedded into a conventional LFSR to create LP-TPG.

REFERENCES

[1] Y.Zorian, ”A Distributed BIST Control Scheme for Complex VLSI Devices,” in Proc. VLSI Test Symp. (VTS’93), pp. 4-9, 1993. [2] S. Wang and S. Gupta, ”DS-LFSR: A New BIST TPG for Low Heat Dissipation,” in Proc. Int. Test Conf. (ITC’97), pp. 848-857, 1997. [3] F. Corno, M. Rebaudengo, M. Reorda, G. Squillero and M. Violante,”Low Power BIST via Non-Linear Hybrid Cellular Automata,” in Proc. VLSI Test Symp. (VTS’00),pp. 29-34, 2000. [4] P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, H. -J. Wunderlich, ”A modified Clock Scheme for a Low Power BIST Test Pattern Generator,” in Proc. VLSI Test Symp. (VTS’01), pp. 306-311, 2001. [5] D. Gizopoulos et. al.,”Low Power/Energy BIST Scheme for Datapaths,” in Proc. VLSI Test Symp. (VTS’00), pp. 23-28, 2000.

NCVCCC-‘08

103

[6] X. Zhang, K. Roy and S. Bhawmik, ”POWERTEST: A Tool for Energy Conscious Weighted Random Pattern Testing,” in Proc. Int. Conf. VLI Design, pp. 416-422, 1999. [7] S. Wang and S. Gupta,”LT-RTPG: A New Test-Per-Scan BIST TPG for Low Heat Dissipation,” in Proc. Int. Test Conf. (ITC’99),pp. 85-941999. [8] P. Girard et. al.,”Low Energy BIST Design: Impact of the LFSR TPG Parameters on the Weighted Switching Activity,” in Proc Int. Symp. on Circuits and Systems (ISCAS’99), pp. , 1999. [9] P. Girard, et. al.,”A Test Vector Inhibiting Technique for Low Energy BIST Dsign,” in Proc. VLSI Test Symp. (VTS’99),pp. 407-412, 1999. [10] S. Manich, et. al.,”Low Power BIST by fi ltering Non-Detecting Vectors,” in Proc. European Test Workshop (ETW’99), pp. 165-170, 1999. [11] F. Corno,M. Rebaudengo,M. Sonza Reorda andM. Violante, ”A New BIST Architecture for Low Power Circuits,” in Proc. European TestWorkshop (ETW’99), pp. 160-164, 1999. [12] X. Zhang and K. Roy,”Peak Power Reduction in Low Power BIST,” in Proc. Int. Symp. on Quality Elect. Design (ISQED’00),pp. 425-432, 2000. [13] Synopsys Inc., “User Manuals for SYNOPSYS Toolset Version 2002.05,” Synopsys, Inc., 2002. [14] S. Manich and J. Figueras,”Sensitivity of the Worst Case Dynamic Power Estimation on Delay and Filtering Models,” in Proc.PATMOS Workshop, 1997.

NCVCCC-‘08

104

Test Pattern Generation For Microprocessors Using Satisfiability Format Automatically and Testing It Using Design

for Testability Cynthia Hubert, II ME and Grace Jency.J, Lecturer, Karunya University

Abstract—This paper is used for testing microprocessor. In this paper a satisfiability based framework for automatically generating test programs that target gate level stuck at faults in microprocessors is demonstrated. The micro architectural description of a processor is translated into RTL for test analysis. Test generation involves extraction of propagation paths from a modules input output ports to primary I/O ports. A solver is then used to find the valid paths that justify the precomputed vectors to primary input ports and propagate the good/faulty responses to primary output ports. Here the test program is constructed in a deterministic fashion from the micro architectural description of a processor that target stuck at faults. This is done using modelsim. Index Terms—microprocessor, satisfiability, test generation, test program.

I.INTRODUCTION

For high-speed devices such as microprocessors, a satisfiability based register transfer level test generator that automatically generates test programs and detects gate level stuck at faults is demonstrated. Test generation at the RTL can be broadly classified into two categories1) constraint-based test generation and 2) precomputed test set-based approach. Constraint-based test generation relies on the fact that a module can be tested by abstracting the RTL environment, in which it is embedded, as constraints. The extracted constraints, with the embedded module, present the gate-level automatic test pattern generation (ATPG) tool with a circuit of significantly lower complexity than the original circuit. While precomputed test set-based approaches first precomputed the test sets for different RTL modules and then attempt to determine functional paths through the circuit for symbolically justifying a test set and the corresponding responses. This symbolic analysis is followed by a value analysis phase, when the actual tests are assembled using the symbolic test paths and the module-level precomputed test sets. The use of precomputed test sets enables RTL ATPG to focus its test effort on determining symbolic justification and propagation paths. However, symbolic analysis is effective only when: 1) a clear separation of controller and datapath in RTL circuits is available and 2) design for testability (DFT) support mechanisms, such as a test architecture, are provided to ease the bottlenecks presented by the controller/datapath interface. The issues were addressed by using a functional circuit representation based on assignment decision diagrams (ADDs).

II.RELATED WORK

Generating test sequences that target gate-level stuck-at faults without DFT. Applicable to both RTL and mixed gate-level /RTL circuits.No need to assume that controller and datapath are seperable. But limited set of transparency rules and limited number of faulty responses used during propagation analysis [1]. Algorithm for generating test patterns that target stuck-at faults at logic level. Reduction in test generation time and improved fault coverage.But it cannot handle circuits with multiple clock functional RTL design [2]. Technique for extracting functional information from RTL controller datapath circuits. Results in low area delay power overheads,high fault coverage and low test generation time. Some faults in the controller are sequentially untestable [3].

III.DESIGN FOR TESTABILITY

Design for test is used here. In fig1, automatic test program generator gives the inputs test vectors to the test access ports (TAP). The Test Access ports gives the input sequences to the system under test which performs the operations and gives it to the signature analyzer which produces the output and this output is compared with the expected output and tells whether it is faulty or good. Suppose an 8bit input is taken. It will have 256 possible combinations. Then each of the 256 combinations is tested. So there is wastage of time. This is that the test vectors are constructed in a pseudo random fashion. Sometimes there is no 100% fault coverage. To overcome the wastage of time only, the test programs are constructed in a deterministic fashion and only precomputed test vectors are taken to reduce the wastage of time. A.How to compute test vectors

This is done by automatic test pattern generation or test program generation. This means that in microprocessors the test vectors are determined manually and put into the memory and then each of the precomputed test vector is taken automatically from the memory and testing is done. B.Unsatisfiable test vector

The test vector generated is not able to perform the particular function or it is not able to perform 100% fault coverage. The test vector, which overcomes this disadvantage, is called satisfiabilility.

NCVCCC-‘08

105

Fig.1 Design for Testability

A satisfiability (SAT)-based framework for automatically generating test programs that target gate-level stuck-at faults in microprocessors. In fig2, the micro architectural description of a processor is first translated into a unified register–transfer level (RTL) circuit description, called assignment decision diagram (ADD), for test analysis. Test generation involves extraction of justification/propagation paths in the unified circuit representation from an embedded module’s input–output (I/O) ports to primary I/O ports, abstraction of RTL modules in the justification/propagation paths, and translation of these paths into Boolean clauses. Since the ADD is derived directly from a micro architectural description, the generated test sequences correspond to a test program. If a given SAT instance is not satisfiable, then Boolean implications (also known as the unsatisfiable segment) that are responsible for unsatisfiability are efficiently and accurately identified. We show that adding design for testability (DFT) elements is equivalent to modifying these clauses such that the unsatisfiable segment becomes satisfiable.The proposed approach constructs test programs in a deterministic fashion from the micro architectural description of a processor. Develop a test framework in which test programs are generated automatically for microprocessors to target gate level stuck-at faults. Test generation is performed on a

Fig.2 Test generation methodology

unified controller/data path representation (ADD) derived from the micro architectural description of the processor. The RTL modules are captured “as-is” in the ADD. In order to justify/propagate the precomputed test vectors/ responses for an embedded module, we first derive all the potential justification/propagation paths from the I/O ports of the embedded module to primary I/O ports. The functionality of RTL modules in these paths is abstracted by their equivalent I/O propagation rules. The generated paths are translated into Boolean clauses by expressing the functionality of modules in these paths in terms of Boolean clauses in conjunctive normal form (CNF). The precomputed test vectors/responses are also captured with the help of additional clauses. These clauses are then resolved using an SAT solver resulting in valid test sequences that are guaranteed to detect the stuck-at faults in the embedded module targeted by the precomputed test vectors. Since the ADD represents the micro architecture of the processor, the test sequences correspond to a test program. RTL test generation also imposes a large number of initial conditions corresponding to the initial state of flip-flops, precomputed test vectors, and propagation of faulty responses. These conditions are propagated through the circuit by the Boolean constraint propagation (BCP) engine in SAT solvers before searching through the sequential search space for a valid test sequence. This results in significant pruning of the sequential search space resulting in test generation time reduction. The Boolean clauses capturing the test generation problem for some precomputed test vectors/responses are not satisfiable. The Boolean variables in these implications are targeted for DFT measures C.Advantages

NCVCCC-‘08

106

1. The proposed approach constructs test programs in a deterministic fashion from the micro architectural description of a processor that target stuck-at faults.

2. Test generation is performed at the RTL, resulting in very low test generation times compared to a gate-level sequential test generator.

3. The proposed test generation-based DFT solution is both accurate and fast. An ADD can be automatically generated from a functional or structural circuit description.In fig3, it consists of four types of nodes: READ, operation, WRITE, and assignment-decision. READ nodes represent the current contents of input ports, storage elements, and constants. WRITE nodes represent output ports and the values held by the storage elements in the next clock cycle. Operation nodes represent various arithmetic and logic operations and the assignment decision node implements the functionality of a multiplexer.

Fig.3 Assignment Decision Diagram

The below fig4 shows RTL datapath of simple

microprocessor. pI1 and pI2 represents the primary inputs,R1,R2,R3 are registers,+ and – represents adder and subtractor,MUL represents multiplier,CMP represents comparator and PO1 represents primary output In fig5, R1,R2,R3 inside the square box represents read node,R1,R2,R3 inside the circle represents write node.+1,-1,mul,cmp represents operation node,A3,A4,A5,A6,A7 represents assignment decision node.M1,M2,L1,L2,L3 are select signals. If M1 is 0,then the output of the adder is selected.If M1 is 1,then pI1 is selected.If L1 is 0,then R1 is selected.If L1 is 1 ,then the output of A4 is selected.If M2 is 0,then pI1 is selected.If M2 is 1,then the output of the

Fig.4 RTL datapath of simple microprocessor

Fig.5 Assignment decision diagram of simple microprocessor datapath

subtractor is selected.If L2 is 0,then R2 is selected.If L2 is 1,then the output of A5 is selected.If L3 is 0,then R3 is selected.If L3 is 1,then the output of the adder is selected,mul does the multiplication of R2 and R3,cmp compares the values of R2 and R1.

NCVCCC-‘08

107

V. PERFORMANCE EVALUATION

Fig.6 Assignment decision diagram output of datapath.

Fig.7 Assignment decision diagram for a new value of

datapath.

Fig 8. Assignment decision diagram for a new value of datapath.

Fig 9. Assignment decision diagram for a new value of datapath.

NCVCCC-‘08

108

Fig 10. Assignment decision diagram for a new value of

datapath.

Fig 11. Assignment decision diagram for a new value of

datapath.

VI. CONCLUSION

In this paper, we present a novel approach that extends SAT based ATPG to generate test programs that detect gatelevel stuck-at faults in microprocessors.

REFERENCES

[1] L. Lingappan, S. Ravi, and N. K. Jha, “Satisfiability-based test generation for nonseparable RTL controller-datapath circuits,” IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 25, no. 3, pp.544–557, Mar. 2006 [2] I. Ghosh and M. Fujita, “Automatic test pattern generation for functional register-transfer level circuits using assignment decision diagrams,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,vol. 20, no. 3, pp. 402–415, Mar. 2001. [3] I. Ghosh, A. Raghunathan, and N. K. Jha, “A design for testability technique for register-transfer level circuits using control/data flow extraction,”IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 17,no. 8, pp. 706–723, Aug. 1998. [4] A. Paschalis and D. Gizopoulos, “Effective software-based self-test strategies for on-line periodic testing of embedded processors,” IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 24, no. 1, pp.88–99, Jan. 2005. [5] B. T. Murray and J. P. Hayes, “Hierarchical test generation using precomputed tests for modules,” IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 9, no. 6, pp. 594–603, Jun. 1990. [6] S. Bhatia and N. K. Jha, “Integration of hierarchical test generation with behavioral synthesis of controller and data path circuits,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 6, no. 4, pp. 608–619, Dec. 1998. [7] H. K. Lee and D. S. Ha, “Hope: An efficient parallel fault simulator for synchronous sequential circuits,” IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol. 15, no. 9, pp. 1048–1058, Sep. 1996. [8] L. Chen, S. Ravi, A. Raghunathan, and S. Dey, “A scalable software based self-test methodology for programmable processors,” in Proc.Design Autom. Conf., 2003, pp. 548–553. [9] N. Kranitis, A. Paschalis, D. Gizopoulos, and G.Xenoulis,“Softwarebased self-testing of embedded processors,” IEEE Trans. Comput., vol.54, no. 4, pp. 461–475, Apr. 2005.

NCVCCC-‘08

109

DFT Techniques for Detecting Resistive Opens in CMOS Latches and Flip-Flops

Reeba Rex.S and Mrs.G.Josemin Bala, Asst.Prof, Karunya University

Abstract-- In this paper, a design-for-testability (DFT) technique is proposed to detect resistive open in conducting paths of clocked inverter stage in CMOS latches and flip-flops is proposed. The main benefit of this paper is, it is able to detect a parametric range of resistive open defects. The testability of the added DFT circuitry is also addressed. Application to large number of cells also considered. Comparison with other previously proposed testable latches is carried out. Circuits with the proposed technique have been simulated and verified using TSPICE. Index Terms—Design-for-testability (DFT), flip-flop, latches, resistive open.

I.INTRODUCTION

The conventional tests cannot detect FET stuck open faults in several CMOS latches and flip-flops. The stuck-open faults can change static latches and flip-flops into dynamic devices-a danger to circuits whose operation requires static memory, since undetected FET stuck-open faults can cause malfunctions. Designs given for several memory devices in which all single FET stuck-open faults are detectable. These memory devices include common latches, master-slave flip-flops, and scan-path flip-flops that can be used in applications requiring static memory elements whose operation can be reliably ascertained through conventional fault testing methods. Stuck at faults occur due thin oxide shorts (the n transistor gate to Vss or the p transistor gate to Vdd), metal metal shorts. Stuck open or stuck closed is due to missing source, drain or gate connection .A open or break at the drain or source of a MOSFET give rise to a class of conventional failure called stuck open faults. If a stuck open exits, a test vector may not always guarantee a unique repeatable logic value at the output because there is no conducting path from the output node to either Vdd or Vss. Undetectable opens may occur in some branches of CMOS latches and flip-flops. This undetectable opens occur in the clocked inverter stage (CIS) of the symmetric D-latch. This is because the input data is correctly written through the driver stage despite the defective stage. Opens in vias-contacts are likely to occur. The number of vias-contacts is high in actual integrated circuits due to the many metal levels. In the damascene-copper process, vias and metal are patterned and etched prior to the additive metallization. The open density in copper shows a higher value than those found in aluminum. Random particle induced-contact defects are the main test target in production testing. In addition, silicided opens can occur due to excess anneal during manufacturing. Low temperature screening technique can detect cold delay

defects such as silicide resistive open. Memory elements like latches and flip-flops are widely used in the design of digital CMOS integrated circuits. Their application depends on the requirements of performance, gate count, power dissipation, area, etc. Resistive opens affecting certain branches of fully static CMOS memory elements are undetected by logic and delay testing. For these opens the input data is correctly written and memorized. However, for high resistive opens the latch may fail to retain the information after some time in the presence of leakage or noise. Testable latches have been proposed for making detectable stuck-open faults in these otherwise undetectable branches. Reddy have proposed a testable latch where an additional controllable input is added to the last stage of the latch. Then a proper sequence of vectors is generated for testing these opens. Also, the delay is penalized due to the added series transistors. Rubio have proposed a testable latch. The number of test vectors is lower than that proposed by Reddy. One additional input is required. The delay is also penalized due to the added series transistors. In this paper, a design-for-testability (DFT) technique for testing full and resistive opens in undetectable branches of fully static CMOS memory elements is proposed. This is the first testable latch able to cover both full opens and parametric resistive opens in otherwise undetectable faulty branches. Design considerations for the DFT circuitry are stated. The results are compared with previous reported testable structures. Here a fault free circuit is taken and simulated. A faulty circuit is taken and DFT circuitry is added and simulated. Now compare both the results of the simulation and the fault is located.

II.DESIGN FLOW

Fig.1. DFT design flow

NCVCCC-‘08

110

III.METHODOLOGY

The methodology used here is DFT circuitry, which is used to detect fault in the undetectable branches in CMOS latches and flip-flops. This open cannot be detected by delay and logic testing. This approach not only considers stuck-open faults, but also resistive opens in the CIS branches. Opens are modeled with a lumped resistance which can take a continuous range of values. The proposed testable CMOS latch cell has four additional transistors and only one control signal are required. The network under test (NMOS or PMOS) is selected by proper initialization of the latch state

IV.DFT PROPOSAL

In this paper, a symmetric CMOS D-latch cell (see fig.2.) has been considered. Possible open locations affecting conductive paths of the CIS (clocked inverter stage) are taken. This approach resistive opens in the CIS branches. Opens are modeled with a lumped resistance which can take a continuous range of values. The proposed testable CMOS latch cell has four additional transistors and only one control signal are required. The network under test (NMOS or PMOS) is selected by proper initialization of the latch state. Resistive opens in the NMOS (PMOS) network are tested as follows: • initialize the latch to 1 (0) state;

• in the memory phase, activate transistors MTP and MTN; • deactivate both transistors MTP and MTN; • observe the output of the CMOS latch. The detect ability of the open defects is determined by the voltages imposed by the DFT circuitry during the memory phase and the latch input/output characteristics.

Fig .2.Symmetrical CMOS latch with undetected

The voltage values imposed during the memorizing phase are determined by the transistor sizes of the DFT circuitry and the latch memorizing circuitry. Let us consider a resistive open in the NMOS network. The latch output was initialized to one state. When the two DFT transistors MTP

and MTN are activated there is a competition of three networks the NMOS branch under test, MTP transistor, and MTN transistor. Due to the resistive open, the strength of the NMOS branch of the CIS decreases. Hence, different voltage values at Q and Qbar appear for the defect-free and the defective cases. The voltage at (Qbar) for the defective

latch is higher (lower) than for the defect-free latch. When the transistors and are deactivated, the cell evolves to a stable quiescent state. The transistors and are sized such that the state of the defective latch flips its state but the state of the defect-free latch remains unchanged. Let Vpg is the PMOS gate voltage and Vng is the NMOS gate voltage. L and W correspond to length and width of the transistors. Rop corresponds to resistive open. Based on the values of Vpg, Vng, L, W, we get different resistive open

value. Fig3. Proposed testable latch with one control signal Let Wp is the width of PMOS DFT transistor, Wn is the width of NMOS DFT transistor. RminP is the minimum detectable resistance for PMOS and RminN is the minimum detectable resistance for NMOS. So based on the Wp/Wn ratio the minimum detectable resistance for PMOS and NMOS varies.

Fig.4. Waveform for fault free symmetrical latch

Fig.5.Timing diagram for latch with one control signal .Resistive open R11=45k

NCVCCC-‘08

111

V. TESTABILITY OF THE DFT CIRCUITRY

The testability of the added DFT circuitry is done. The DFT circuitry is composed of the transistors MTP, MTN and the inverter (see Fig. 3). Let us focus in the transistors MTP and MTN. Defects affecting the DFT inverter can be analyzed in a similar way. Stuck-open faults, resistive open defects and stuck-on faults are considered. Resistive opens located in conducting paths of the two DFT transistors can be tested using the same procedure than for opens affecting undetectable branches of the latch. For a stuck-open fault at the NMOS DFT transistor (see Fig. 3) the latch is initialized to one logic. When in memory phase the two DFT transistors are activated, the voltage at Qbar(Q) increases(decreases). The voltage at Qbar(Q) tends to a higher (lower) value than for the defect-free case because the NMOS transistor if off. After the two DFT transistors are deactivated the defect-free (defective) maintains (changes) the initialized state. Hence, the defect is detected. Resistive opens are tested in a similar way. Low values of resistive opens can be detected. For the used latch topology resistive opens as low as 5 k is detectable.

Fig.6.Output of the DFT circuitry for the case of Rop=45k

A. Waveform Description Fig.4.corresponds to waveform of symmetrical latch initialized to d=1(5v).In fault free condition we get Q=1 and Qbar=0. Fig.6. corresponds to waveform of the output of DFT circuitry. The DFT transistors are activated when control is low. When the DFT transistors are activated, the faulty latch voltage at Qbar(Q) tends to increase(decrease).Here Vp=2.2V and Vn=1.5V, Vp and Vn are voltages at the gates of the transmission gate VI. APPLICATION TO A LARGE NUMBER OF CELLS

Fig.7.Skewing activation of scan cell by blocks.

Let us assume a scan design. Using the proposed technique, a current pulse appears at the power buses during the activation the DFT circuitry. When the DFT circuitries of the flip-flops in the scan chain are

simultaneously activated the current drawn from the power supply could be important. Due to the high current density, mass transport due to the momentum transfer between conducting electrons and diffusion metal atoms can occur. This phenomenon is known as electro migration. As a consequence the metal lines can be degraded and even an open failure can occur. The activation of the DFT circuitries for blocks of scan cells can be skewed to minimize stressing on the power buses during test mode. This is implemented inserting delay circuitries in the path of the control signal of blocks of scan cells (see Fig. 7).In this way, the activation of the DFT circuitries of each block of scan cells is time skewed. Hence, at a given time there is a stressing current pulse due to only one block of flip-flops. For comparison purposes, the current drawn from the power supply for 4 symmetrical flip-flops cells simultaneously activated and time skewed is shown in fig. 8.and fig.9. In this example, the scan chain has been divided in three blocks of 4 cells each one. A delay circuitry composed of 4 inverters has been implemented.

Fig.8.Current consumption with delay circuitry

Fig.9.Current consumption without delay circuitry

When we see the waveform, the current consumption with delay circuitry is 9µA and the current consumption without delay circuitry is 14µA.

NCVCCC-‘08

112

VII. COMPARISON WITH OTHER TESTABLE LATCHES

Technique Add

Input

Add Trans

.

RDET

[2] 1 4 R∞ [3] 2 4 R∞

This Proposal

1 4 >40k-∞

Table.1.Comparison with other testable latches Table.1. shows a comparison between our proposal and other testable latch structures [2], [3]. This proposal requires one additional input. The number of additional inputs for proposals previously reported is also given. In this proposal, the number of additional transistors per cell is smaller than for the other techniques. The delay penalization using our proposal is significantly small. This technique requires eight vectors for testing both CIS branches of the latch. For testing one branch, the first vector writes the desired state into the latch. The second vector memorizes this state. Then, the third vector activates the DFT circuitry and the fourth vector deactivates the DFT circuitry. A similar sequence is required for complementary branch. The main benefit of this proposal is that it can detect a parametric range of the resistance of the open. The other proposals only detect a line completely open (or infinite resistive open).

VII. CONCLUSION A DFT technique to test resistive opens in otherwise undetectable branches in fully static CMOS latches and flip-flops has been proposed. The main benefit of this proposal is that it is able to detect a parametric range of resistive opens with reduced performance degradation. We can apply this DFT technique for other flipflops.

REFERENCES

[1]Antonio Zenteno Ramirez, Guillermo Espinosa, and Victor Champac “Design-for-Test Techniques for Opens in Undetected Branches in CMOS Latches and Flip-Flops,” IEEE Transaction on VLSI Systems, vol.15, no. 5, may 2007. [2] M. K. Reddy and S. M. Reddy, “Detecting FET stuck-open faults in CMOS latches and flip-flops,” IEEE Design Test, vol. 3, no. 5, pp. 17–26, Oct. 1986. [3] A. Rubio, S. Kajihara, and K. Kinoshita, “Class of undetectable stuck open branches in CMOS memory elements,” Proc. Inst. Elect. Eng.-G, vol. 139, no. 4, pp. 503–506, 1992. [4] C. -W. Tseng, E. J. McCluskey, X. Shao, and D. M. Wu, “Cold delay defect screening,” in Proc. 18th IEEE VLSI Test Symp., 2000, pp. 183–188. [5]Afzel Noore,”Reliable detection of CMOS stuck open faults due to variable internal delays”,IEICE Electronics Express,vol..2, no.8, pp. 292-297. [6] S. M. Samsom, K. Baker, and A. P. Thijssen, “A comparative analysis of the coverage of voltage and tests of realistic faults in a CMOS flip-flop,” in Proc. ESSCIRC 20th Eur. Solid-State Circuits Conf., 1994, pp. 228–231. [7] K. Banerjee, A. Amerasekera, N. Cheung, and C. Hu, “High-current failure model for VLSI interconnects under short-pulse stress conditions,” IEEE Electron Devices Lett., vol. 18, no. 9, pp. 405–407, Sep.1997.

NCVCCC-‘08

113

2-D fractal array design for 4-D Ultrasound Imaging Ms. Alice John, Mrs.C.Kezi Selva Vijila

M.E. Applied electronics, HOD-Asst. Professor Dept. of Electronics and Communication Engineering

Karunya University, Coimbatore.

Abstract- One of the most promising techniques for limiting complexity for real time 3-D ultra sound systems is to use sparse 2-D layouts. For a given number of channels, optimization of performance is desirable to ensure high quality volume images. To find optimal layouts, several approaches have been followed with varying success. The most promising designs proposed are Vernier arrays, but also these suffer from high peaks in the side lobe region compared with a dense array. In this work, we propose new method based on the principal of suppression of grating lobes. The proposed method extends the concept of fractal layout. Our design has simplicity in construction, flexibility in the number of active elements and the possibility of suppression of grating lobes.

Index Terms- 4-D Ultrasound imaging, sparse 2-D array, fractal layout, sierpinski car pet layout.

1. INTRODUCTION The new medical image modality, volumetric imaging,

can be used for several applications including diagnostics, research and non-invasive surgery. Existing 3-D ultrasound systems are based on mechanically moving 1-D arrays for data collections and preprocessing of data to achieve 3-D images. The main aim is to minimize the number of channels without compromising image quality and to suppress the side lobes. New generations of ultrasound systems will have the possibility to collect and visualize data in near real time. To develop the full potential of such a system, an ultrasound probe with a 2-D transducer array is needed.

Current systems use linear arrays with more than 100 elements. A 2-D transducer array will contain between 1500 and 10,000 elements. Such arrays represent a technological challenge because of the high channel count [1]. To overcome this challenge, undersampling the 2-D array by only connecting some of the all possible elements [2] is a suitable solution. For a given set of constraints, the problem is to choose those elements that give the most appropriate beam pattern or image. The analysis of such sparse array beam patterns has a long history. A short review of some of these works can be found in [3].

Several methods for finding sparse array layouts for 4-D ultrasound imaging have been reported. Random approaches have been suggested by Turnbull et al. [4], [5] and this work has been followed by Duke University [6]-[7]. Weber et al. have suggested using genetic algorithms. Similar layouts have been found out by Holm et al. using

linear programming and by Trucco using simulated annealing.

Sparse arrays can be divided into 3 categories, random, fractal, periodic. One of the promising category is sparse periodic arrays [8]. These are based on the principal of different transmit and receive layouts, where the grating lobes in the transmit array response are suppressed by receive array response and vice versa. Periodic arrays utilize partial cancellation of transmit and receive grating lobes. Sparse periodic arrays have a few disadvantages; one is the use of overlapping elements, another is the strict geometry which fixes the number of elements. An element in a 2-D array will occupy a small area compared to an element in a 1-D. The sparse periodic array is having high resolution but there is frequent occurrence of side lobes.

In the sparse random arrays, one element is chosen at random according to a chosen distribution function. Due to randomness, the layouts are very easy to find. The sparse random arrays are having low resolution but the suppression of side lobes is maximum. By exploiting the properties of sparse random arrays and sparse periodic arrays, we go for fractal arrays. In Fractal arrays, we can obtain high resolution with low side band level by using the advantages of both periodic and random arrays.

To simplify future integration of electronics into the probe, the sparse transmit and receive layouts should be chosen to be non-overlapping. This means that some elements should be dedicated to transmit while others should be used to receive. To increase system performance, future 2-D arrays should possibly include pre-amplifiers directly connected to the receive elements.

The paper is organized in the following manner. Section II describes fractal array design starting with sierpinsky fractal, carpet fractal and then pulse echo response. Section III describes the simulation and performance of different designs by adjusting the kerf value. In section IV, we summarize the paper.

II. FRACTAL ARRAY LAYOUTS

A fractal is generally a rough or fragmented geometric shape that can be subdivided into parts, each of which is (at least approximately) a reduced-size copy of the whole, a property called self-similarity.The Fractal component model has the following important features:

• Recursivity : components can be nested in composite components

• Reflectivity: components have full introspection and intercession capabilities.

NCVCCC-‘08

114

• Component sharing: a given component instance can be included (or shared) by more than one component.

• Binding components: a single abstraction for components connections that is called bindings. Bindings can embed any communication semantics from synchronous method calls to remote procedure calls

• Execution model independence: no execution model is imposed. In that, components can be run within other execution models than the classical thread-based model such as event-based models and so on.

• Open: extra-functional services associated to a component can be customized through the notion of a control membrane.

A. Sierpinski Fractal

In the sierpinski fractal we have considered mainly two types

• Sierpinski triangle

• Sierpinski carpet

B. Sierpniski Triangle

The Sierpinski Triangle also called Sierpinski Gasket and Sierpinski Sieve.

• Start with a single triangle. This is the only triangle in this direction; all the others will be drawn upside down.

• Inside the first triangle, we have drawn a smaller upside down triangle. It's corners should be exactly in the centers of the sides of the large triangle

C. Sierpinski Carpet

In this paper we are mainly considering carpet layout because we are considering 2-D array.

• Transmitter array: transmit array is drawn using a matrix M consisting of both ones and zeros. These arrays have been constructed by considering a large array of element surrounded by a small matrix. In carpet fractal array first of all we have drawn a square at the right middle and this small square will occupy 1/3rd of the original big array. Surrounding the above built square we have constructed small squares.

• Receiver array: in the sparse 2-D array layout to avoid overlapping we are selecting different receiver and transmitter arrays. In our paper we

have taken those elements for receiver array which will never cause an overlapping.

D. Pulse-Echo Response

The layout should have optimal pulse-echo performance, i.e. the pulse-echo radiation pattern should have as low sidelobe level as possible for a specified mainlobe width for all angles and depths of interest. To compute the pulse-echo response for a given transmit and receive layout is time consuming. A simplification commonly used is to evaluate the radiation properties in continuous wave mode in the far field. An optimal set of layouts for continuous waves does not necessarily give optimal pulse-echo responses. To ensure reasonable pulse-echo performance, additional criteria which ensure a uniform distribution of elements could be introduced. This will limit the interference in the sidelobe region between pulses transmitted from different elements and reduce the sidelobe level.

Fig. 1. Pulse-echo response of a sierpinsky carpet layout

III. RESULTS AND DISCUSSION

Fractal layout exploits the advantages of both the periodic and random arrays. Our main aim is to suppress the sidelobes and to narrow down the mainlobe. Firstly we have created transmit and receive array layouts. Both the layouts have been constructed in such a way they both won’t overlap each other. Transmit array is designed using a matrix M. Iterations up to 3, were taken to construct the transmit array. The intensity distributions were taken to find out the spreading of the sidelobe and the mainlobe.

In our paper we have taken into consideration different specifications such as speed of the sound wave i.e. 1540 m/s, initial frequency, sampling frequency as 100.10^6 HZ, width and height of the array, kerf is also considered that is the height between the elements in an array.

NCVCCC-‘08

115

A. case I: kerf = 0

We have simulated the transmitter and receiver layout in this we can see since kerf value i.e. the distance between the elements are given as zero there is no spacing between the elements. From the pulse-echo response we can come to the conclusion that in this case the mainlobe is not sharp but the sidelobe level is highly suppressed. Fig. 2(a-b) shows the transmitter and receiver layout. Fig. 2© shows the pulse-echo response and Fig. 2(d) shows the intensity distribution from which we can see that the side lode level is reduced.

B. case II: kerf = lamda/2

In the second case the kerf value is taken as lamda/2, so we can see a lamda/2 spacing between the transmitter and receiver array. Fig 3(a-b) shows the layouts. Fig. 3© shows the pulse-echo response in which we can see that the mainlobe is now sharp but the sidelobes are not highly suppressed. Fig. 3(d) shows the intensity distribution where the sidelobe level is high compared to that of case I.

C. case III: kerf = lamda/4

In the third case kerf value is taken as lamda/4 Fig. 4(a-b) shows the array layouts. Fig. 4© shows pulse-echo response in which the main lobe is sharp but the sidelobe level is high. From the intensity distribution also we can see that the sidelobe distribution is high compared to case II.

D. case IV: kerf = lamda

In the last case kerf value is taken as lamda and because of this we can see a spacing of lamda between the elements in the array. Fig. 5(a-b) shows the transmitter and receiver layout. Fig. 5© shows the pulse-echo response here the mainlobe very sharp but the sidelobe level started spreading towards both sides. Fig. 5(d) shows its intensity distribution. The intensity distribution shows the spreading of the sidelobe clearly. The sidelobe level in this case is high compared to all other cases.

(a) Transmitter array

(c ) Pulse-Echo Response

(b) Receiver array

(d) Intensity distribution

Fig. 2. (a)-(b) show array layout and (c)-(d) show pulse response for kerf=0

(a) Transmitter array (b) Receiver array

NCVCCC-‘08

116

(c ) Pulse-Echo response


Fig. 3. (a)-(b) show array layout and (c)-(d) show pulse echo response for kerf=lamda/2


(b) Receiver array



Fig. 4. (a)-(b) show array layout and (c)-(d) show pulse echo response for kerf=lamda/4


(b) Receiver array

NCVCCC-‘08

117


(d ) Intensity Distribution

Fig. 5. (a)-(b) show array layout and (c)-(d) show pulse echo response for kerf=lamda

IV. CONCLUSION

To construct a 2-D array for 4-D ultrasound imaging we need to meet many constraints in which an important one is regarding the mainlobe and sidelobe level. To execute this we are going for pulse-echo response. We have shown it is possible to suppress the unwanted sidelobe levels by adjusting different parameters of the array layout. We have also shown the changes in the intensity level while adjusting the spacing between array elements. As a future we will calculate the mainlobe BW, ISLR and the sidelobe peak value to take the correct fractal, the above shown parameters will affect the image quality.

REFERENCES

[1]B. A. J. Angelsen, H. Torp, S. Holm, K.

Kristoffersen, and T. A. Whittingham, “Which transducer array is best?,” Eur. J. Ultrasound, vol. 2, no. 2, pp. 151-164, 1995.

[2]S. Holm, “Medical ultrasound transducers and beamforming,” in Proc. Int. Cong. Acoust., pp. 339-342, Jun. 1995.

[3]R. M. Leahy and B. D. Jeffs, “On the design of maximally sparse beamforming arrays,” IEEE Trans. Antennas Propagat.,vol. AP-39, pp. 1178-1187, Aug. 1991.

[4]D. H. Turnbull and F. S. Foster, “Beam steering with pulsed two-dimensional transducer arrays,” IEEE Trans. Ultrason.,Ferroelect., Freq. Contr., vol. 38, no. 4, pp. 320-333, 1991.

[5]D. H. Turnbull, “Simulation of B-scan images from two-dimens between linear and two-dimensional phased arrays,” Ultrason. Imag., vol. 14, no. 4, pp. 334-353, Oct. 1992.

NCVCCC’08

118

Secured Digital Image Transmission over Network Using Efficient Watermarking Techniques on Proxy Server

Jose Anand, M. Biju, U. Arun Kumar JAYA Engineering College, Thiruninravur, Near Avadi, Chennai 602024.

Email:- [email protected], [email protected]

Abstract: With the rapid growth of Internet technologies and wide availability of multimedia computing facilities, the enforcement of multimedia copyright protection becomes an important issue. Digital watermarking is viewed as an effective way to deter content users from illegal distributing. The watermark can be used to authenticate the data file and for tamper detection. This is much valuable in the use and exchange of digital media, such as audio and video, on emerging handheld devices. However, watermarking is computationally expensive and adds to the drain of the available energy in handheld devices.This paper analyzes the performance of energy, average power and execution time of various watermarking algorithms. Also we propose a new approach in which a partition is made for the watermarking algorithm to embed and extract by migrating some tasks to the proxy server. Security measures have been provided by DWT, which leads to a lower energy consumption on the handheld device without compromising the security of the watermarking process. Proposed approach shows that executing the watermarking tasks that are partitioned between the proxy and the handheld devices, reduce the total energy consumed by a good factor, and improve the performance by two orders of magnitude compared to running the application on only the handheld device. Keywords:- energy consumption, mobile computing, proxy server, security, watermarking.

I. INTRODUCTION Watermarking is used to provide copyright protection

for digital content. A distributor embeds a mark into a digital object, so ownership of this digital object can be proved. This mark is usually a secret message that contains the distributor’s copyright information. The mark is normally embedded into the digital object by exploiting the usually inherent information redundancy.

The problem arises when a dishonest user tries to delete the mark in the digital object before redistribution in order to claim ownership. In consequence, the strength of watermarking schemes must be based on the difficulty of locating and changing the mark. There are many watermarking approaches that try to protect the intellectual property of multimedia objects, especially images, but unfortunately very little attention has been given to software watermarking.

There are two kinds of digital watermarking, visible and invisible. The visible watermarking contains visible

information like a company logo to indicate the ownership of the multimedia. The visible watermarking causes distortion of the cover image, and hence the invisible watermarking is more practical. Invisible watermarking, as the name suggests, the watermark is imperceptible in the watermarked image. Invisible watermarking can be classified into three types, robust, fragile and semi-fragile.

A popular application of watermarking techniques is to provide a proof of ownership of digital data by embedding copyright statements into video or image digital products. Automatic monitoring and tracking of copy-write material on web, automatic audit of radio transmissions, data augmentation, fingerprinting applications, all kind of data like audio, image, video, formatted text models and model animation parameters are examples where watermarking can be applied.

To allow the architecture to use a public-key security model on the network while keeping the devices themselves simple, we create a software proxy for each device. All objects in the system, e.g., appliance, wearable gadgets, software agents, and users have associated trusted software proxies that either run on an embedded processor on the appliance, or on a trusted computer.

In the case of the proxy running on an embedded processor on the appliance, we assume that device to proxy communication is inherently secure. If the device has minimal computational power and communicates to its proxy through a wired or wireless network, we force the communication to adhere to a device to proxy protocol. The proxy is software that runs on a network-visible computer.

The proxy’s primary function is to make access-control decisions on behalf of the device it represents. It may also perform secondary functions such as running scripted actions on behalf of the device and interfacing with a directory service. The device to proxy protocol varies for different types of devices. In particular, we consider lightweight devices with higher bandwidth devices with low bandwidth wireless network connections and slow CPUs and heavyweight devices with higher bandwidth connections and facter CPUs.

It was assumed that heavyweight devices are capable of running proxy software locally. With a local proxy, a sophisticated protocol for secure device to proxy communication is unnecessary, assuming critical parts of the device are tamper resistant. For lightweight devices, the proxy must run elsewhere.

The proxy and device communicate through a secure channel that encrypts and authenticates all the messages. Different algorithms are used for authentication and encryption. It may use symmetric keys. In this paper the

NCVCCC’08

119

energy profile of various watermarking algorithms are analyzed, and also analyzed the impact of security and image quality on energy consumption.

Then a task partitioning scheme for wavelet based image watermarking algorithms in which computationally expensive portions of the watermarking are offloaded to a proxy server. The proxy server acts as an agent between the content server and the handheld device is used for various other tasks such as data transcoding, load management. The partitioning scheme can be used to reduce energy consumption associated with watermarking on the handheld without compromising the security of the watermarking process.

II. WATERMARKING

The increasing computational capability and availability of broadband in emerging handheld devices have made them true endpoints of the internet. They enable users to download and exchange a wide variety of media such as e-books, images, etc. Digital watermarking has been proposed as a technique for protecting intellectual property of digital data.

It is the process of embedding a signature/watermark into a digital media file so that it is hidden from view, but can be extracted on demand to verify the authenticity of the media file. The watermark can be a binary data, a logo, or a seed value to a pseudorandom number generator to produce a sequence of numbers with a certain distribution.

Watermarking can be used to combat fraudulent use of wireless voice communications, authenticating the identity of cell phones and transmission stations, and securing the delivery of music and other audio content. Watermarking bears a large potential in securing such applications, for example, e-fax for owner verification, customer authentication in service delivery, and customer support.

Watermarking algorithms are designed for maximum security with little or no consideration for other system constraints such as computational complexity and energy availability. Handheld devices such as PDAs and cell phones have a limited battery life that is directly affected by the amount of computational burden placed by the application. Digital watermarking tasks place an additional burden on the available energy in these devices.

Watermarking, like steganography, seeks to hide information inside another object. Therefore, it should be resilient to intentional or unintentional manipulations and resistant to watermark attacks. Although several techniques have been proposed for remote task execution for power management, these do not account for the application security during the partitioning process.

Figure 1 Architecture of target system

Figure 1 shows our implementation of a watermarking system in which multimedia content is streamed to a handheld device via a proxy server. This system consists of three components: mobile devices, proxy servers, and content servers.

A mobile or handheld device refers to any type of networked resource; it could be handheld (PDA), a gaming device, or a wireless security camera. Content servers store multimedia and database content and stream data (images) to a client as per request. All communication between the mobile devices and the servers are relayed through the proxy servers.

Proxy servers are powerful servers that can, among other things, compress/decompress images, transcode video in real-time, access/provide directory services, and provide services based on a rule base for specific devices. Figure 2 shows the general process of watermarking image data, where the original image (host image) is modified using a signature to create the watermarked image.

In this process, some error or distortion is introduced. To ensure transparency of the embedded data, the amount of image distortion due to the watermark embedding process has to be small. There are three basic tasks in the watermarking process with respect to an image as shown in figure 2. A watermark is embedded either in the spatial domain or in the frequency domain. Detection and extraction refers to whether an image has a watermark and extracting the full watermark from the image. Authentication refers to comparing the extracted water mark with the original watermark.

Figure 2 Watermarking process (a) watermark generation and embedding (b) watermark extraction and authentication

Watermarks are used to detect unauthorized

modifications of data and for ownership authentication. Watermarking techniques for images and video differ in that watermarking in video streams takes advantage of the temporal relation between frames to embed water marks.

Figure 3 Digital Signal Generations

NCVCCC’08

120

A simple approach for embedding data into images is to set the least significant bit of some pixels to zero. Data is then embedded into the image by assigning 1’s to the LSBs in a specific manner which is known only to the owner. This method satisfies the perceptual transparency property, since only the least significant bit of an 8-bit value is altered.

In DCT-based watermarking, the original image is divided into 8 x 8 blocks of pixels, and the two-dimensional (2-D) DCT is applied independently to each block. The watermark is then embedded into the image by modifying the relationship of the neighboring blocks of the DCT coefficients that are in the middle-frequency range in the original image.

The spatial and frequency domain watermarking techniques used in still images, are extended to the temporal domain for video streams. In this, one can take advantage of the fact that in MPEG video streams the frames and the bi-directional frames are derived from reference intermediate frames using motion estimation. Wavelet-based watermarking is one of the most popular approaches due to its robustness against malicious attacks.

Wavelet-based image watermark embedding consists of three phases: 1) watermark preprocessing; 2) image preprocessing; and 3) watermark embedding, as shown in Figure 4. First, each bit in each pixel of both the image and the watermark is assigned to a bit plane. There are 8 bit planes corresponding to the gray-level resolution of the image/watermark.

Then DWT coefficients are obtained for each bit plane by carrying out DWT on a plane-by-plane basis. The DWT coefficients of the watermark are encrypted using a public key. The watermark embedding algorithm then uses the coefficients of the original image and those of the encrypted watermark to generate the watermarked image. A similar reverse process is used for watermark extraction and authentication.

First, the encrypted coefficients of the image and the watermark are extracted from the image. Then a secret private key is used to decrypt the coefficients of the watermark and an inverse DWT is applied and so on, till the original watermark is obtained. DWT uses filters with different cutoff frequencies to analyze a signal at different resolutions.

The signal is passed through a series of high-pass filters, also known as wavelet functions, to analyze the high frequencies and it is passed through a series of low-pass filters, also known as scaling functions, to analyze the low frequencies. Activate the application. We present two partitioning schemes—the first gives priority to reduction of energy consumption.

This watermark process migration is applicable in office environments where a trusted proxy can act as an “agent” or representative for the mobile device and can take care of authentication and quality of service negotiation with the content server. A more secure partitioning scheme for both watermark embedding and extraction requires some participation from the device in the watermarking process.

During watermark embedding, we migrate the following tasks to the proxy: bit decomposition, coefficient calculation using DWT, and watermark coefficient encryption using the public key. So, the handheld first sends the image and the watermark to the proxy. The proxy processes them and sends the image and watermark coefficients back to the handheld.

The handheld then embeds the watermark coefficients into the image using a unique coefficient relationship to generate the watermarked image. This is a secure approach as the proxy does not know the coefficient relationship used to embed the watermark coefficients in the image. During watermark extraction, the handheld extracts the image, and watermark coefficients from the watermarked image and uses its private secure key to decrypt the image and watermark coefficients.

The handheld sends the image coefficients to the proxy for processing, such as carrying out inverse DWT; on the other hand, it processes the coefficients of the watermark itself to generate the watermark. Then it authenticates the watermark against the original watermark. The fact that the watermark is not sent to the proxy makes this scheme secure against any potential malicious attack by the proxy which is shown in the figure 4 and 5 respectively

.

Figure 4 Embedding and Extraction

Figure 5 Partioning image watermarking and embedding and

extraction process

III. EXPERIMENTAL SETUP

Our experimental setup is shown in Figure 6. All the measurements were made using a Sharp Zaurus PDA with an Intel 400---MHz XScale processor with a 64-MB ROM and 32-MB SDRAM. It uses a National Instruments PCI-6040E data acquisition (DAQ) board to sample the voltage drop across the resistor (to calculate current) at 1000 samples/s. The DAQ has a resolution of 16 bits.

IV. ENERGY CONSUPTION ANALYSIS We calculated the instantaneous power consumption

corresponding to each sample and the total energy using the following equations: where is the instantaneous voltage drop across the resistor in volts with resistance and is the voltage across the Zaurus PDA or the supply voltage, and is the sampling period. Energy is the sum of all the instantaneous power samples for the duration of the execution of the application multiplied by the sampling period. We calculate average power as the ratio of total energy over total execution time.

NCVCCC’08

121

TABLE I Embedding Energy, Power and Execution Time Analysis

Algorithm Energy(J)

Avg. Power (W = J/s)

Exec. Time

(s)

Bruyndonckx 1.47 0.11 13.46 Corvi 83.20 0.61 136.15 Cox 126.00 1.10 115.23 Dugad 68.70 0.50 136.64 Fridrich 196.00 1.15 171.00 Kim 73.50 0.52 140.81 Koch 2.19 0.17 12.64 Wang 85.80 0.61 140.20 Xia 90.00 0.67 133.82 Xie 154.80 1.05 147.07 Zhu 163.30 1.14 143.74

Table I lists the energy usage, average power

(energy/execution time), and execution time for watermark embedding by the various watermarking algorithms when they are executed on the handheld device. Calculating wavelet and inverse-wavelet transforms is computationally expensive and, thus, also power hungry.

The large variation in the power consumption of the different algorithms can be in part attributed to the difference in the type of instructions executed in each case. The instruction sequence executed is largely dependent on algorithmic properties which enable certain optimizations such as vectorization and on the code generated by the compiler.

We present the energy, power, and execution time analysis of watermark extraction in Table II. Watermark extraction is more expensive than watermark embedding. During extraction, the transform is carried out on both the input image and the output image, and the corresponding coefficients are normalized.

The correlation between the normalized coefficients of the input and output is used as a measure of the fidelity of the watermarked image. The overhead of computing band wise correlation and image normalization accounts for the higher energy consumption.

In Table III, we list the energy, power, and execution time for watermark authentication. This task is computationally inexpensive, since it involves a simple comparison of the extracted watermark and the original watermark.

TABLE II Extracting Energy, Power and Execution Time

Analysis

Algorithm Energy(J)


Exec. Time (s)

Bruyndonckx 0.22 0.79 0.28 Corvi 70.30 0.47 150.77 Cox 121.00 0.95 128.02 Dugad 38.40 0.49 79.00 Fridrich 191.00 1.10 173.60 Kim 91.30 0.55 166.57 Koch 0.61 0.61 1.00

Wang 88.00 0.59 147.90 Xia 82.70 0.57 144.51 Xie 74.06 1.00 73.88 Zhu 158.80 1.16 137.38

TABLE III Authentication of Energy, Power and Execution

Time Analysis

Algorithm Energy(J)


Exec. Time

(s)

Bruyndonckx 0.02 0.59 0.034 Corvi 0.10 0.73 0.138 Cox 0.05 1.35 0.037 Dugad 0.03 0.97 0.031 Fridrich 0.18 1.36 0.132 Kim 0.10 0.76 0.131 Koch 0.04 1.25 0.032 Wang 0.08 1.36 0.059 Xia 0.08 1.40 0.057 Xie 0.04 1.00 0.039 Zhu 0.06 1.20 0.050

V. CONCLUSION

In this paper the energy characteristics of several wavelet based image watermarking algorithms are analyzed and designed a proxy-based partitioning technique for energy efficient watermarking on mobile devices. The energy consumption due to watermarking tasks can be minimized for the handheld device by offloading the tasks completely to the proxy server with sufficient security. So this approach maximizes the energy savings and ensures security. These approaches can be enhanced by providing some error correction codes while embedding and on extraction stages.

REFERENCES

[1] A. Fox and S. D. Gribble, “Security on the move: Indirect authentication using kerberos,” in Proc. Mobile Computing Networking, White Plains, NY, 1996, pp. 155–164.

[2] B. Zenel, A Proxy Based Filtering Mechanism for the Mobile Environment Comput. Sci. Dept., Columbia University, New York, 1995, Tech. Rep. CUCS-0-95.

[3] A. Rudenko, P. Reiher, G. J. Popek, and G. H. Kuenning, “The remote processing framework for portable computer power saving,” in Proc. 1999 ACM Symp. Appl. Comput., 1999, pp. 365–372.

[4] U. Kremer, J. Hicks, and J. Rehg, Compiler-directed remote task execution for power management: A case study, Compaq Cambridge Research Laboratory (CRL), Cambridge, MA, 2000, Tech. Rep. 2000-2.

[5] P. Rong and M. Pedram, “Extending the lifetime of a network of battery-powered mobile devices by remote processing: A markovian decision-based approach,” in Proc. 40th Conf. Des. Automat., 2003, pp. 906–911.

[6] A. Rudenko, P. Reiher, G. J. Popek, and G. H. Kuenning, “The remote processing framework for

NCVCCC’08

122

portable computer power saving,” in Proc. 1999 ACM Symp. Appl. Comput., 1999, pp. 365–372.

[7] U. Kremer, J. Hicks, and J. Rehg, Compiler-directed remote task execution for power management: A case study, Compaq Cambridge Research Laboratory (CRL), Cambridge, MA, 2000, Tech. Rep. 2000-2.

[8] F. Hartung, J. K. Su, and B. Girod, “Spread spectrum watermarking: Malicious attacks and counterattacks,” in Security Watermarking Multimedia Contents, 1999, pp. 147–158.s

[9] W. Diffie and M. E. Hellman, “New directions in cryptography,” IEEE Trans. Inform. Theory, vol. IT-22, no. 6, pp. 644–654, Nov. 1976. 25] W. Diffie and M. E. Hellman, “New directions in cryptography,” IEEE Trans. Inform. Theory, vol. IT-22, no. 6, pp. 644–654, Nov. 1976.

[10] I. Cox, J. Kilian, T. Leighton, and T. Shamoon, “Secure spread spectrum watermarking for multimedia,” IEEE Trans. Image Process., vol.S. Voloshynovskiy, S. Pereira, and T. Pun, “Watermark attacks,” in Proc. Erlangen Watermarking Workshop, 1999.

[11] Arun Kejariwal (S’02) received the B. Tech. degree in electrical engineering from the Indian Institute of Technology (IIT), New Delhi, India, in 2002. S. Voloshynovskiy, S. Pereira, and T. Pun, “Watermark attacks,” in Proc. Erlangen Watermarking Workshop, 1999.

NCVCCC’08

123

Significance of Digital Signature and Implementation through RSA Algorithm

R.VIJAYA ARJUNAN, M.E, Member, ISTE: LM -51366 Senior Lecturer, Department of Electronics and Communication & Bio medical Engineering, AArupadai Veedu Institute of

Technology, Vinayaka Missions University, Old Mahabalipuram Road, Chennai. [email protected] & [email protected]

Abstract-Internet-enabled wireless devices continue to proliferate and are expected to surpass traditional Internet clients in the near future. This has opened up exciting new opportunities in the mobile e-commerce market. However, data security and privacy remain major concerns in the current generation of “Wireless Web” offerings. All such offerings today use a security architecture that lacks end-to-end security. This unfortunate choice is driven by perceived inadequacies of standard Internet security protocols like SSL on less capable CPUs and low-bandwidth wireless lines. This article presents our experiences in implementing and using standard security mechanisms and protocols on small wireless devices. We have created new classes for Java 2 Micro-Edition platform that offer fundamental cryptographic operations such as message digests and ciphers as well as higher level security protocols solution for ensuring end-to-end security of wireless Internet transactions even within today’s technological constraints.

I. CRYPTOGRAPHY

Cryptography is the science of using mathematics to encrypt and decrypt data. Cryptography enables you to store sensitive information or transmit it across insecure networks (like the Internet) so that it cannot be read by anyone except the intended recipient. While cryptography is the science of securing data, cryptanalysis is the science of analyzing and breaking secure communication. Classical cryptanalysis involves an interesting combination of analytical reasoning, application of mathematical tools, pattern finding, patience, determination, and luck. Cryptanalysts are also called attackers. Cryptology embraces both cryptography and cryptanalysis. PGP is also about the latter sort of Cryptography. Cryptography can be strong or weak, as explained above. Cryptographic strength is measured in the time and resources it would require to recover the plaintext. The result of strong cryptography is cipher text that is very difficult to decipher without possession of the appropriate decoding tool. How difficult? Given all of today’s computing power and available time—even a billion Computers doing a billion checks a second—it is not possible to decipher the result of strong cryptography before the end of the universe. One would think, then, that strong cryptography would hold up rather well against even an extremely determined cryptanalyst. Who’s really to say? No one has proven that the strongest encryption obtainable today will hold up under tomorrow’s computing power. However, the strong cryptography employed by PGP is the best available today.

II.CONVENTIONAL CRYPTOGRAPHY

In conventional cryptography, also called secret-key or symmetric-key encryption, one key is used both for encryption and decryption. The Data Encryption Standard (DES) is an example of a conventional cryptosystem that is widely employed by the Federal Government. Figure is an illustration of the conventional encryption process.

Key management and conventional encryption Conventional encryption has benefits. It is very fast. It is especially useful for encrypting data that is not going anywhere. However, conventional encryption alone as a means for transmitting secure data can be quite expensive simply due to the difficulty of secure key distribution. Recall a character from your favorite spy movie: the person with a locked.Briefcase handcuffed to his or her wrist. What is in the briefcase, anyway? It’s the key that will decrypt the secret data. For a sender and recipient to communicate securely using conventional encryption, they must agree upon a key and keep it secret between themselves. If they are in different physical locations, they must trust a courier, the Bat Phone, or some other secure communication medium to prevent the disclosure of the secret key during transmission. Anyone who overhears or intercepts the key in transit can later read, modify, and forge all information encrypted or authenticated with that key.

III. PUBLIC KEY CRYPTOGRAPHY The problems of key distribution are solved by public key cryptography, the concept of which was introduced by Whitfield Diffie and Martin Hellman in 1975. Public key cryptography is an asymmetric scheme that uses a pair of keys for encryption: a public key, which encrypts data, and a corresponding private, or secret key for decryption. You publish your public key to the world while keeping your private key secret. Anyone with a copy of your public key can then encrypt information that only you can read. Even people you have never met. It is computationally infeasible to deduce the private key from the public key. Anyone who has a public key can encrypt information but cannot decrypt it.Only the personwho has the corresponding private key can decrypt Information.

NCVCCC’08

124

Key A key is a value that works with a cryptographic algorithm to produce a specific cipher text. Keys are basically really, really, really big numbers. Key size is measured in bits; the number representing a 1024-bit key is darn huge. In public key cryptography, the bigger the key, the more secure the cipher text. However, public key size and conventional cryptography’s secret key size are totally unrelated. A conventional 80-bit key has the equivalent strength of a 1024-bit public key. A conventional 128-bit key is equivalent to a 3000-bit public key. Again, the bigger the key, the more secure, but the algorithms used for each type of cryptography are very different and thus comparison is like that of apples to oranges. While the public and private keys are related, it’s very difficult to derive the private key given only the public key; however, deriving the private key is always possible given enough time and computing power. This makes it very important to pick keys of the right size; large enough to be secure, but small enough to be applied fairly quickly. Additionally, you need to consider who might be trying to read your files, how determined they are, how much time they have, and what their resources might be

Larger keys will be cryptographically secure for a longer period of time. If what you want to encrypt needs to be hidden for many years, you might want to use a very large key. Of course, who knows how long it will take to determine your key using tomorrow’s faster, more efficient computers? There was a time when a 56-bit symmetric key was considered extremely safe. Keys are stored in encrypted form. PGP stores the keys in two files on your hard disk; one for public keys and one for private keys. These files are called key rings. As you use PGP, you will typically add the public keys of your Recipients to your public key ring. Your private keys are stored on your private key ring. If you lose your private key ring, you will be unable to decrypt any information encrypted to keys on that ring.

IV. DIGITAL SIGNATURES A major benefit of public key cryptography is that it provides a method for employing digital signatures. Digital signatures enable the recipient of information to verify the authenticity of the information’s origin, and also verify that the information is intact. Thus, public key digital signatures provide authentication and data integrity. A digital signature also provides non-repudiation, which means that it prevents the sender from claiming that he or she did not actually send

NCVCCC’08

125

the information. These features are every bit as fundamental to cryptography as privacy, if not more. A digital signature serves the same purpose as a handwritten signature. However, a handwritten signature is easy to counterfeit. A digital signature is superior to a handwritten signature in that it is nearly impossible to counterfeit, plus it attests to the contents of the information as well as to the Identity of the signer. Some people tend to use signatures more than they use encryption. For example, you may not care if anyone

knows that you just deposited $1000 in your account, but you do want to be darn sure it was the bank teller you were dealing with. The basic manner in which digital signatures are created is illustrated. Instead of encrypting information using someone else’s public key, you encrypt it with your private key. If the information can be decrypted with your public key, then it must have originated with you.

V. RSA ENCRYPTION

Public Key Cryptography One of the biggest problems in cryptography is the distribution of keys. Suppose you Live in the United States and want to pass information secretly to your friend in Europe. If you truly want to keep the information secret, you need to agree on some sort of key That you and he can use to encode/decode messages. But you don't want to keep using The same key or you will make it easier and easier for others to crack your cipher.But it's also a pain to get keys to your friend. If you mail them, they might be stolen. If You send them cryptographically, and someone has broken your code, that person will Also have the next key. If you have to go to Europe regularly to hand-deliver the next Key, that is also expensive. If you hire some courier to deliver the new key, you have to Trust the courier, etcetera. RSA Encryption In the previous section we described what is meant by a trap-door cipher, but how do you make one? One commonly used cipher of this form is called RSA Encryption, where RSA are the initials of the three creators: Rivest, Shamir, and Adleman. It is based on the following idea: It is very simply to multiply numbers together, especially with computers. But it can be very difficult to factor numbers. For example, if I ask you to multiply together 34537 and 99991, it is a simple matter to punch those numbers into a calculator and 3453389167. But the reverse problem is much harder.

Suppose I give you the number 1459160519. I'll even tell you that I got it by multi-Person a selects two prime numbers. 1. We will use p = 23 and q = 41 for this example, but keep in mind that the real numbers person A should use should be much larger. 2. Person A multiplies p and q together to get PQ = (23)(41) = 943. 943 are the public keyî, which he tells to person B (and to the rest of the world, if he wishes). 3. Person A also chooses another number e, which must be relatively prime to (p _ 1) (q _1 ) In this case, (p _ 1)(q _ 1) = (22)(40) = 880, so e = 7 is _ne. e is Also part of the public key, so B also is told the value of e. 4. Now B knows enough to encode a message to A. Suppose, for this example, that The message is the number M = 35.

5. B calculates the value of C = Me (mod N) = 357(mod 943). 6. 357 = 64339296875 and 64339296875(mod 943) = 545. The number 545 is The encoding that B sends to A. 7. Now A wants to decode 545. To do so, he needs to _nd a number d such that Ed = 1(mod (p _ 1)(q _ 1)), or in this case, such that 7d = 1(mod 880). A Solution is d = 503, since 7 _ 503 = 3521 = 4(880) + 1 = 1(mod 880). 8. To _nd the decoding, A must calculate Cd (mod N) = 545503(mod 943). This

Looks like it will be a horrible calculation, and at _rst it seems like it is, but notice That 503 = 256+128+64+32+16+4+2+1 (this is just the binary expansion of 503).

So this means that 545503 = 545256+128+64+32+16+4+2+1 = 545256545128 _ _ _ 5451: But since we only care about the result (mod 943), we can calculate all the partial results in that modulus, and by repeated squaring of 545, we can get all

NCVCCC’08

126

the exponents that are powers of 2. For example, 5452(mod 943) = 545 _ 545 = 297025(mod 943) = 923. Then square again: 5454(mod 943) = (5452)2(mod 943) = 923 _ 923 = 851929(mod 943) = 400, and so on. We obtain the following table: 5451(mod 943) = 545 5452(mod 943) = 923 5454(mod 943) = 400 5458(mod 943) = 633 54516(mod 943) = 857

54532(mod 943) = 795 54564(mod 943) = 215 545128(mod 943) = 18 545256(mod 943) = 324 So the result we want is: 545503(mod 943) = 324 _ 18 _ 215 _ 795 _ 857 _ 400 _ 923 _ 545(mod 943) = 35: Using this tedious (but simple for a computer) calculation, A can decode B's message And obtain the original message.

VI. PERFORMANCE ANALYSIS OF VARIOUS CRYPTO ANALYTIC SYSTEMS

Key length comparison

ECC( base point) RSA(modulus n)

106 bits 512 bits

132 bits 768 bits

160 bits 1024 bits

224 bits 2048 bits

KEY SIZE EQUIVALENT SECURITY LEVELS

KEY SIZE EQUIALENT SECURITY LEVEL (IN BITS)

SYMMETRIC ECC DH/RSA

80 163 1024

128 283 3072

192 409 7680

256 571 15360

VII. CONCLUSIONS AND FUTURE WORK

Our experiments done with RSA & Other crypto analytic algorithm show that SSL is a viable technology even for today’s mobile devices and wireless networks. By carefully selecting and implementing a subset of the protocol’s many features, it is possible to ensure acceptable performance and compatibility with a large installed base to secure Web servers while maintaining a small memory footprint. Our implementation brings mainstream security mechanisms, trusted on the wired Internet, to wireless devices for the first time.

The use of standard SSL ensures end-to-end security, an important feature missing from current wireless architectures. The latest version of J2ME MIDP incorporating KSSL can be downloaded. In our ongoing effort to further enhance cryptographic performance on small devices, we plan to

explore the use of smart cards as hardware accelerators and Elliptic Curve Cryptography in our implementations.

1 2 3 4

0

5000

10000

15000

20000

KEY SIZE EQVT SECURITY LEVEL

SYMMETRIC

ECC

DH/RSA

1 2 3 4

0

500

1000

1500

2000

2500

ECC VS RSA

KEY LENGTH COMPARISON

ECC( BASE PT)

RSA( MOD N)

NCVCCC’08

127

REFERENCES

1) R.L.Rivest, A.Shamir & L.M.Adleman, “A method for obtaining digital signatures and public Key cryptosystems”, Communications of the ACM, 21 (1978), 120-126.FIPS 186, “Digital Signature Standard”, 1994. 2) W.Diffie & M.E.Hellman, “New directions in cryptography”, IEEE Transactions on Information Theory, 22 (1976), 644-654. 3) J. Daemen and V. Rijmen, AES Proposal: Rijndael, AES Algorithm Submission, September 3, 1999. 4) J. Daemen and V. Rijmen, The block cipher Rijndael, Smart Card research and Applications, LNCS 1820, Springer-Verlag, pp. 288-296. 5)A. Frier, P. Kariton, and P.Kocher, “ The SSL3.0 protocol version 3.0” ; http://home .netscape.com/eng/ssl3. 6) D. Wagner and B. Schneier, “ Analysis of the SSL3.0 Protocol” 2nd USENIX Wksp Elect, Commerce, 1996; http:// www.cs.berkeley.edu/werb+-+daw/papers 7) WAP Forum, “Wireless Transport Layer Security Specification”; ttp://www.wapforum.org/what/technical, htm 8) A. Lee, NIST Special Publication 800-21, Guideline for Implementing Cryptography In the Federal Government, National Institute of Standards and Technology, Nov ‘99 9) A. Menezes, P. van Oorschot, and S. Vanstone, Handbook of Applied Cryptography, CRC Press, New York, 1997. 10) J. Nechvatal, ET. Al., Report on the Development of the Advanced Encryption Standard (AES), National Institute of Standards and Technology, October 2, 2000.

NCVCCC’08

128

A Survey on Pattern Recognition Algorithms For Face Recognition N.Hema*, C.Lakshmi Deepika**

*PG Student **Senior Lecturer

Department of ECE PSG College of Technology

Coimbatore-641 004 Tamil Nadu, India.

Abstract- This paper discusses about face recognition. Where face recognition refers to an automated or semi-automated process of matching facial images. Since it has got its own disadvantage the thermal face recognition is used. The major advantage of using thermal infrared imaging is to improve the face recognition performance. While conventional video cameras sense reflected light, thermal infrared cameras primarily measure emitted radiation from objects such as faces [1]. Thermal infrared (IR) imagery offers a promising alternative to visible face recognition as it is relatively insensitive to variations in face appearance caused by illumination changes. The fusion of visual and thermal face recognition can increase the overall performance of face recognition systems. Visual face recognition systems perform relatively well under controlled illumination conditions. Thermal face recognition systems are advantageous for detecting disguised faces or when there is no control over illumination. Thermal images of individuals wearing eyeglasses may result in poor performance since eyeglasses block the infrared emissions around the eyes, which are important features for recognition. With taking advantages of each visual and thermal image, the new fused systems can be implemented in collaborating low-level data fusion and high-level decision fusion [4, 6].This survey was further carried out through neural network and support vector machine. Neural networks have been applied successfully in many pattern recognition problems, such as optical character recognition, object recognition, and autonomous robot driving. The advantage of using neural networks for face recognition is the feasibility of training a system to capture the face patterns. However, one drawback of network architecture is that it has to be extensively tuned (number of layers, number of nodes, learning rates, etc.) to get exceptional performance. Support Vector Machines can also be applied to face detection [8]. Support vector machines can be considered as a new paradigm to train polynomial function, or neural networks.

I.INTRODUCTION

Face recognition has developed over 30 years and is still a rapidly growing research area due to increasing demands for security in commercial and law enforcement applications. Although, face recognition systems have reached a significant level of maturity with some practical success, face recognition still remains a challenging problem due to large variation in face images. Face recognition is usually achieved through three steps: acquisition, normalization and recognition. This acquisition can be accomplished by digitally scanning an existing photograph or by taking a photograph of a live subject [2].

Normalization includes the segmentation, alignment and normalization of the face images. Finally, recognition includes the representation and modeling of face images as identities, and the association of novel face images with known models. In order to realize such a system, acquisition, normalization and recognition must be performed in a coherent manner. The thermal infrared (IR) spectrum comprises mid-wave infrared (MWIR) ranging from (3-5 µm), and long-wave infrared (LWIR) ranging from (8-12 µm), all longer than the visible spectrum is from (0.4-0.7 µm). Thermal IR imagery is independent of ambient lighting since thermal IR sensors only measure the heat emitted by objects [3]. The use of thermal imagery has great advantages in poor illumination conditions, where visual face recognition systems often fail. It will be a highly challenging task if we want to solve those problems using visual images only.

II.VISUAL FACE RECOGNITION

A face is a three-dimensional object and can be seen differently according to inside and outside elements. Inside elements are expression, pose, and age that make the face seen differently. Outside elements are brightness, size, lighting, position, and other Surroundings. The face recognition uses a single image or at most a few images of each person are available and a major concern has been scalability to large databases containing thousands of people. Face recognition addresses the problem of identifying or verifying one or more persons by comparing input faces with the face images stored in a database [6]. While humans quickly and easily recognize faces under variable situations or even after several years of separation, the problem of machine face recognition is still a highly challenging task in pattern recognition and computer vision. Face recognition in outdoor environments is a challenging task especially where illumination varies greatly. Performance of visual face recognition is sensitive to variations in illumination conditions. Since faces are essentially 3D objects, lighting changes can cast significant shadows on a face. This is one of the primary reasons why current face recognition technology is constrained to indoor access control applications where illumination is well controlled. Light reflected from human faces also varies significantly from person to person. This variability, coupled with dynamic lighting conditions, causes a serious problem. Face recognition can be classified into two broad categories: feature-base and holistic methods. The analytic or feature-based approaches compute a set of geometrical features from the face such as the eyes, nose, and the mouth. The holistic or appearance-based methods consider the global properties of the human face pattern.

Data reduction and feature extraction schemes make the face recognition problem computationally

NCVCCC’08

129

tractable. Some of the commonly used methods for visual face recognition is as follows,

NEURAL NETWORK BASED FACE RECOGNITION

A neural network can be used to detect frontal view of faces. Each network is trained to provide the output as the presence or absence of a face [9]. In this the training methods are designed to be general, with little customization for faces. Many face detection have used the idea that facial images can be characterized directly in terms of pixel intensities. The algorithm such as neural network-based face detection method describes a retinally connected neural network examines small windows of an image, and decides whether each window contains a face. It arbitrates between multiple networks to improve performance over a single network Training a neural network for the face detection task is challenging because of the difficulty in characterizing prototypical “no face” images. The two classes to be discriminated in face detection are “images containing faces” and “images not containing faces”. It is easy to get a representative sample of images which contain faces, but much harder to get a representative sample of those which do not contain faces.

A NEURAL BASED FILTER

It contains a set of neural network-based filters of an image, and then uses an arbitrator to combine the outputs. The filters examine each location in the image at several scales, looking for locations that might contain a face.

The arbitrator then merges detections from individual filters and eliminates overlapping detections. It is a filter that receives a input as 20x20 pixel region of the image, and generates an output ranging from 1 to -1, signifying the presence or absence of a face, respectively [12]. To detect faces anywhere in the input, the filter is applied at every location in the image. To detect faces larger than the window size, the input image is repeatedly reduced in size (by sub sampling), and the filter is applied at each size [13].

SUPPORT VECTOR MACHINE

Among the existing face recognition techniques, subspace methods are widely used in order to reduce the high dimensionality of the face image. Much research is done on how they could express expressions.

The Karhunen–Loeve Transform (KLT) is used to produce the most expressive subspace for face representation and recognition. Linear discriminant analysis (LDA) or Fisher face is an example of the most discriminating subspace methods. It seeks a set of features that best separates the face classes. Another important subspace method is the Bayesian algorithm using probabilistic subspace it is different from other subspace techniques, which classify the test face image into M classes of M individuals, the Bayesian algorithm casts the face recognition problem into a binary pattern classification problem. The aim of the training of the SVMs will be to find the hyperplane (if the classes are linearly separable) or the surfaces which separate the six different classes[8].

CELLULAR NEURAL NETWORK

Cellular neural networks or cellular nonlinear networks (CNN) provide an attractive paradigm for very large-scale integrated (VLSI) circuit architecture in applications devoted to pixel-parallel image processing. The resistive-fuse network is well-known as an effective model for image segmentation, and some analog circuits implementing. Gabor filtering is an effective method for extracting the features of images, and it is known that such filtering is used in the human vision system. A flexible face recognition technique using this method has also been proposed [19]. To implement Gabor-type filter using analog circuits, CNN models have been proposed. A pulse-width modulation (PWM) approach technique is used for achieving time-domain analog information processing. The pulse signals which have digital values in the voltage domain and analog values in the time domain. The PWM approach is suitable for the large-scale integration of analog processing circuits because it matches the scaling trend in Si CMOS technology and leads to low voltage operation [20]. It also has high controllability and allows highly effective matching with ordinary digital systems.

III.THERMAL FACE RECOGNITION

Face recognition in the thermal infrared domain has received relatively little attention when compared to visible face recognition. Identifying faces from different imaging modalities, in particular the infrared imagery has become an area of growing interest.

THERMAL CONTOUR MATCHING The thermal face recognition extracts and matches thermal contours for identification. Such techniques include elemental shape matching and the eigenface method. Elemental shape matching techniques use the elemental shape of thermal face images. Several different closed thermal contours can be observed in each face. The sets of shapes are unique for each individual because they result from the underlying complex network of blood vessels. Variations in defining the thermal slices from one image to another has the effect of shrinking or enlarging. In the resulting shape the centroid location and other features of the shapes are constant.

A NON-ITERATIVE ELLIPSE FITTING ALGORITHM

Ellipses are often used in face-recognition technology such as, face detection and other facial component analysis. The use of an ellipse can be a powerful representation of certain features around the faces in the thermal images. The general equation of a conic can be represented as F(A,T) = AT = ax2+bxy+cy2+dx+ey+f Where A = [a,b,c,d,e,f] and T = [x2,xy,y2,x,y,I]T. Commonly used conic fitting methods minimize the algebraic distance in terms of least squares. While the minimization can be solved by a generalized eigenvalue system which can be denoted as

NCVCCC’08

130

DTDA = SA = CA Where S = [X1, X2… Xn]

T is called the design matrix. S = DTD is called scatter matrix and C is a constant matrix. Least squares conic fitting was commonly used for fitting ellipses, but it can lead to other conics [6]. The non-iterative ellipse-fitting algorithm that yields the best least square ellipse fitting method has a low eccentricity bias, is affine-invariant, and is extremely robust to noise.

IV.FUSION OF VISUAL AND THERMAL IMAGES

There are several motivations for using fusion: utilizing complementary information can reduce error rates; use of multiple sensors can increase reliability. The fusion can be performed using pixel based fusion in wavelet domain and feature based fusion in eigen face domain. FEATURE BASED FUSION IN EIGEN FACE DOMAIN Fusion in the eigenspace domain involves combining the eigen features from the visible and IR images. Specifically, first we compute two eigen spaces, one using the visible face images and the other using the IR face images. Then, each face is represented by two sets of eigen features, the first computed by projecting the IR face image in the IR-eigenspace, and the second by projecting the visible face image in the visible-eigenspace. Fusion is performed by selecting some eigen features from the IR-eigenspace and some from the visible-eigenspace. PIXEL BASED FUSION IN WAVELET DOMAIN Fusion in the wavelet domain involves combining the wavelet coefficients of the visible and IR images. To fuse the visible and IR images, we select a subset of coefficients from the IR image and the rest from the visible image. The fused image is obtained by applying the inverse wavelet transform on the selected coefficients. The fusion can also be done by pixel-wise weighted summation of visual and thermal images. F(x,y) = a(x,y)V(x,y)+b(x,y)T(x,y) where F(x,y) is a fused output of a visual image, V(x,y) and a thermal image, T(x, ,y) while a(x,y) and b(x,y) represent the weighting factor of each pixel. A fundamental problem is: which one has more weight at the pixel level. This can be answered if we know the illumination direction which affects the face in the visual images and other variations which affect the thermal images. Illumination changes in the visual images and facial variations after exercise in the thermal images are also one of challenging problems in face recognition technology[14]. Instead of finding each weight, we make use of the average of both modalities constraining both weighting factors a(x,y) b(x,y) as 1.0. The average of visual and thermal images can compensate variations in each other, although this is not a perfect way to achieve data fusion. Figure 1 shows a fused image based on average intensity using (a) visual and (b) thermal images(c)fused image

Figure 1: A data fusion example. (a) Visual image, (b) thermal image, and (c) fused Image.

V.CONCLUSION In this paper fusion of visual thermal images was discussed and the various methods such as neural networks and support vector machine for recognition purpose were discussed. Till now the cellular neural network was applied only for visual face recognition [20]. But there are effective IR cameras which can take thermal image irrespective of the surrounding conditions. So we propose that the same can be used for thermal face recognition to get effective results.

REFERENCES [1] Y. Adini, Y. Moses, and S. Ullman, “Face Recognition: The Problem of Compensating for Changes in Illumination Direction,” IEEE Trans. Pattern Analysis and Machine Intell igence , Vol. 19, No. 7, pp.721-732, 1997. [2] P. J. Phillips, P. Grother, R. J. Micheals, D. M. Blackburn, E. Tabassi, and M. Bone, “Face Recognition Vendor Test 2002,” evaluation Report, National , pp.1-56, 2003. Institute of Standards and Technology [3] M. S. Bartlett, J. R. Movellan, and T. J. Sejnowski, “Face recognition by independent component analysis,” IEEE Trans. Neural Networks, Vol. 13, No. 6 pp.1450-1464, 2002. [4] Y. Yoshitomi, T. Miyaura, S. Tomita, and S. Kimura, “Face identification using thermal image processing,” Proc. IEEE Int. Workshop on Robot and Human , pp.374-379, 1997. Communication [5] J. Wilder, P. J. Phillips, C. Jiang, and S. Wiener, “Comparison of Visible and Infrared Imagery for Face Recognition,” Proc. Int. Conf. Automatic Face and , pp.182-187, 199. Gesture Recognition. [6] J. Heo, B. Abidi, S. Kong, and M. Abidi, “Performance Comparison of Visual and Thermal Signatures for Face Recognition,” Biometric Consortium, Arlington, VA, Sep 2003. [7] Y.I. Tian, T. Kanade, J.F. Cohn, Recognizing action units for facial expression analysis, IEEE Trans. Patt. Anal. Mach. Intell. 23 (2) (2001) 97–115.

NCVCCC’08

131

[8] J. Wan, X. Li, PCB Infrared thermal imaging diagnosis using support vector classifier, Proc. World Congr. Intell. Control Automat. 4 (2002) 2718–2722. [9] Y. Yoshitomi, N. Miyawaki, S. Tomita, S. Kimura, Facial expression recognition using thermal image processing and neural network, Proc. IEEE Int. Workshop Robot Hum. Commun. (1997) 380–385. [10] E. Hjelmas, B.K. Low, Face detection: a survey, Comput. Vis. Image Und. 83 (3) (2001) 236–274. [11] M.H. Yang, D.J. Kriegman, N. Ahuja, Detecting faces in images: a survey, IEEE Trans. Patt. Anal. Mach. Intell. 24 (1) (2002) 34–58. [12] H.A. Rowley, S. Baluja, T. Kanade, Neural network-based face detection, IEEE Trans. Patt. Anal. Mach. Intell. 20 (1) (1998) 23–38. [13]R.Feraud,O.J.Bernier,J.E.Viallet,M.Collobert,Afastandaccuratefacedetectorbasedonneuralnetworks, IEEE Trans. Patt. Anal. Mach. Intell. 23 (1) (2001) 42–53. [14] D. Socolinsky, L. Wol?, J. Neuheisel, C. Eveland, Illumination invariant face recognition using thermal infrared imagery, Comput. Vision Pattern Recogn. 1 (2001) 527–534. [15] X. Chen, P. Flynn, K. Bowyer, Visible-light and infrared face recognition, in: Proc. Workshop on Multimodal User Authentication, 2003, pp. 48–55. [16] K. Chang, K. Bowyer, P. Flynn, Multi-modal 2d and 3d biometrics for face recognition, in: IEEE Internat. Workshop on Analysis and Modeling of Faces and Gestures, 2003, pp. 187–194. [17] R.D. Dony, S. Haykin, Neural network approaches t image compression, Proc. IEEE 83 (2) (1995) 288–303. [18] Dowdall, J., Pavlidis, I., Bebis, G.: A face detection method based on multib and feature extraction in the near-IR spectrum. In: Proceed- ings IEEE Workshop on Computer Vision Beyond the Visible Spectrum: Methods and Applications, Kauai, Hawaii (2002) [19] T.Morie, S.Sakabayashi, H. Ando, A.Iwata, “Pulse Modulation Circuit Techniques for Non- linear Dynamical Systems,” in Proc. Int. Symp. on Non- linear Theory and its Application (NOLTA’98), pp. 447– 450,Crans-Montana,Sept.1998. [20] AMulti-Functional Cellular Neural Network Circuit Using Pulse Modulation Signals for Image Recognition TakashiMorie, MakotoMiyake, SeiichiNishijima, Makoto Nagata,and AtsushiIwata Faculty of Engineering, Hiroshima University Higashi-Hiroshima,739-8527Japan.

NCVCCC’08

132

Performance Analysis of Impulse Noise Removal Algorithms for Digital Images K.Uma1, V.R.Vijaya Kumar2

1PG Student,2Senior Lecturer Department of ECE,PSG College of Technology

Abstract:- In this paper, three different impulse noise removal algorithms are implemented and their performances are analysed. First algorithm uses alpha trimmed mean based approach to detect the impulse noise. Second algorithm follows the principle of multistate median filter. Third algorithm works under the principle of thresholding. Experimental result shows that these algorithms are capable of removing impulse noise effectively compared to many of the standard filters in terms of quantitative and qualitative analysis.

I. INTRODUCTION

The acquisition or transmission of digital images caused by through sensors or communication channels is often interfered by impulse noise. It is very important to eliminate noise in the images before subsequent processing, such as image segmentation, object recognition, and edge detection. Two common types of impulse noise are the salt and pepper noise and the random value impulse noise. There are large number of techniques have been proposed to remove impulse noise from corrupted images. Many existing methods are an impulse detector to determine whether a pixel should be modified. Images corrupted by salt and pepper noise, the noisy pixels can take only the maximum and minimum values. Median filter[6] was once the most popular non linear filter for removing impulse noise, because of its good denoising power and computational efficiency. However, when the noise level is over 50%, some details and edges of the original image are smeared by the filter. Different remedies of the median filter have been proposed, e.g. the adaptive median filter, the multi-state median filter, Switching strategy is also another method to identify the noisy pixels and then replace them by using the median filter or its variants. These filters are good at detecting noise even at a high noise level. The main drawback of median filter is details and edges are not recovered satisfactorily, especially when the noise level is high. NASM [4] filter performs and achieve fairly close performance to that of ideal switching median filter. Weighted median filter control the filter ring performance in order to preserve the signal details. Centre weighted median filter where only the centre pixel of filtering window has weighting factor and then Filtering should be applied to corrupted pixels only while leaving the uncorrupted ones. Switching based median filter[4]methodologies by applying no filtering to true pixels, standard median filter to remove impulse noise. Mean filter; rank filter and alpha trimmed mean filter are also used to remove impulse noise. II. IMPULSE NOISE DETECTION ALGORITHM Alpha trimmed mean based approach [1] is used to detect the impulse noise. This algorithm consists of three steps:

impulse noise detection, refinement, and impulse noise cancellation, which replaces the values of identified noisy pixels with median value. A.IMPULSE NOISE DETECTION Let I denote the corrupted, noisy image of

size 1 2l l× , and ijX is its pixel value at position ( ),i j . Let

ijW denote the window of size ( ) ( )2 1 2 1d dL L+ × +

centered about. ijX .

( )1

1( )

2* i

t t

iji t

M I Xt t

αα

αα

−

= +

=− ∑

( )22 1dt L= + . is the trimming parameter that assumes

values between 0 and 0.5, X(i) represents the ith data item in

the increasingly ordered samples of ijW i.e.

x(1)x(2)………x(t). That is,

( )iX =ith smallest ( ( ))ijW I

The alpha trimmed mean ( )ijM Iα with appropriately

chosenα ,represents the approximately the average noise

free pixel values within the window ( ( ))ijW I Absolute

difference between ijx and ( )ijM Iα

( )r x M Iij ij ijα= − .

ijr Should be relatively large for noisy pixel and small for

noise free pixels.

First, when the pixel ijx is an impulse, it takes a value

substantially larger than or smaller than those of its

neighbors. Second, when the pixel ijx is a noise-free pixel,

which could belong to a flat region, an edge, or even a thin line, its value will be very similar to those of some of its neighbors. Therefore, we can detect image details from noisy pixels by counting the number of pixels whose values are

similar to that of ijx in its local window.

,

,

0,

1i u j v ij

i u j v

x x T

otherwiseδ − −

− −

− <⎧= ⎨⎩

T is a predetermined parameter, ,i u j vδ − − =1 indicates the

pixel ,i u j vx − − is similar to that of pixel ijx . ijξ denotes the

number of pixels which are similar to that of neighbour pixels.

NCVCCC’08

133

,,d d

ij i u j vL u v L

ξ δ − −− ≤ ≤

= ∑

,

,

0,

1i j

i j

N

otherwise

ξϕ

≥⎧= ⎨⎩

N is a predetermined parameter. ,i jϕ =0 indicates xij is a

noise free pixel.

,

,,

00,

1 1

i jij i j

i j

rϕ

ϕϕ

=⎧∗ = ⎨ =⎩

(1)ij ijR r ϕ= × .

(a) (b)

(c) (d)

(e) Fig 1 Impulse noise detection (a) Corrupted by 20%fixed value impulse noise (b) Absolute difference image (c)Binary flag (d) Product of binary image and absolute difference image.(e)Restored image.

(1)R Retains the impulse noise and remove the image details. Next apply a fuzzy impulse detection technique for each pixel. Fuzzy flag is used to measure how much the pixel is corrupted. Noisy pixels are located near one of the two samples in the ordered samples. Based on this observation refinement of fuzzy flag can be generated.Accoding to that impulse noise can be effectively removed. Compare this method to median filter it shows better performance and it also remove the random value impulse noise. To demonstrate the superior performance of this method, extensive experiments have been conducted on a variety of standard test images to compare this method with many other well known Techniques.

B. SPACE INVARIANT MEDIAN FILTER

This paper is under median based switching schemes, called multi-state median[2] (MSM) filter. By using simple thresholding logic, the output of the MSM [5] filter is adaptively switched among those of a group of center weighted median (CWM) filters that have different center weights. As a result, the MSM filter is equivalent to an adaptive CWM filter with a space varying center weight which is dependent on local signal statistics. The efficacy of this filter has been evaluated by extensive simulations. By employing simple thresholding logic; the output of the proposed multi-state median (MSM) filter is then adaptively switched among those of a group of CWM filters that have varying center weights. As a result, the MSM filter is equivalent to an adaptive CWM[6] filter with a space varying center weight which is dependent on local signal statistics. Sij and Xij denote the intensity values of the original image and the observed noisy image, respectively, at pixel

location ( ),i j .

C..1CWM FILTER The output of CWM filters, in which a weight

adjustment is applied to the center or origin pixel Xij within a sliding window, can be defined as

( )wij ijY median X=

, , wij i s j tX X − −= W◊ ijX

The median is then computed on the basis of those 8+w samples. Here w denotes the centre weight. The output of a CWM filter with center weight w can also be represented by

, ( ), ( 1 )wij ij ij ijY median X K X X N k= + −

Where ( )2 2/k N w= + − .

Based on the fact that CWM filters with different center weights have different capabilities of suppressing noise and preserving details. This can be realized by a simple thresholding operation as follows. For the current pixel Xij, we first define differences

ww ij ijd Y X= − 1,3,5....... 2w N= −

(a) (b) Fig 2 Space invariant median filter (a) Noisy image (20% impulse noise). (b) Restored image These differences provide information about the likelihood of corruption for the current pixel Xij. For instance, consider the difference dN_2. If this value is large, then the current pixel is not only the smallest or the largest one among the observation samples, but very likely contaminated by

impulse noise. If 1d is small, the current pixel may be

regarded as noise-free and be kept unchanged in the filtering.

NCVCCC’08

134

Together, differences d1 through 2Nd − reveal even more

information about the presence of a corrupted pixel. A classifier based on differences d w is employed to estimate the likelihood of the current pixel being contaminated. An attractive merit of the MSM filtering technique is that it provides an adaptive mechanism to detect the likelihood of a pixel being corrupted by impulse. As a result, it satisfactorily trades off the detail preservation against noise removal by adjusting the center weight of CWM filtering, which is dependent on the local signal characteristics. Furthermore, it possesses a simple computation structure for implementation. D. MULTIPLE THRESHOLD

A novel decision-based filter, called the multiple thresholds switching (MTS) filter [3], is to restore images corrupted by salt-pepper impulse noise. The filter is based on a detection-estimation strategy. The impulse detection algorithm is used before the Filtering process, and therefore only the noise-corrupted pixels are replaced with the estimated central noise-free ordered mean value in the current filter window. The new impulse detector, which uses multiple thresholds with multiple neighborhood information of the signal in the filter window, is very precise, while avoiding an undue increase in computational complexity.To avoid damage to good pixels, decision-based median filters realized by thresholding operations have been introduced.

In general, the decision-based filtering procedure consists of the following two steps: an impulse detector that classifies the input pixels as either noise-corrupted or noise-free, and a noise reduction filter that modifies only those pixels that are classified as noise-corrupted. In general, the main issue concerning the design of the decision-based median filter focuses on how to extract features from the local information and establish the decision rule, in such a way to distinguish noise-free pixels from contaminated ones as precisely as possible. In addition, to achieve high noise reduction with fine detail preservation, it is also crucial to apply the optimal threshold value to the local signal statistics. Usually a trade-off exists between noise reduction and detail preservation. This filter takes a new impulse detection strategy to build the decision rule and practice the threshold function. The new impulse detection approach based on multiple thresholds considers multiple neighborhood information of the filter window to judge whether impulse noise exists. The new impulse detector is very precise without, while avoiding an increase in computational complexity. The impulse detection algorithm is used before the filtering process starts, and therefore only the noise-corrupted pixels are replaced with the estimated central noise-free ordered mean value in the current filter window. Extensive experimental results demonstrate that the new filter is capable of preserving more details while effectively suppressing impulse noise in corrupted images.

. (a) (b) Fig 3 Multiple threshold (a) Noisyimage (20%) (b)Restored image.

COMPARISIONS

0

5

1015

20

253035

40

45

10 20 30 40 50

NOISE DENSITY

PSNR

ATM

SIMF

MULTIPLETHRESHOLD

MEDIAN

CWM

WM

Fig 4 Performance Comparisons between various filters.

III. CONCLUSION In this paper removal of impulse noise was discussed and the detection of impulse noise by various methods such as Fuzzy flag, noise refinement and classifier was discussed. Restoration performances are quantitatively measured by the peak-signal-to-noise-ratio (PSNR), MAE, MSE. Impulse noise detection by alpha trimmed mean approach method provides a significant improvement on other state-of-the-art methods. Among the various impulse noise removal algorithms, the alpha trimmed mean based approach yield better PSNR when compared to other algorithms

REFERENCES

[1]Wenbin Luo (2006),’ An Efficient Detail-Preserving Approach for Removing Impulse Noise in Images’, IEEE signals processing letters, vol. 13, No.7,pp.413-416.

[2] Tao Chen and Hong Ren Wu (2001), ‘Space Variant Median\

Filters for the Restoration of Impulse Noise Corrupted Images’,

IEEE transactions on circuits and systems—ii: analog and digital

Signal processing, Vol. 48, NO. 8, pp.784-789. [3] Raymond chan (2006),’ Salt-Pepper Impulse Noise Detection and Removal Using Multiple Thresholds for Image Restoration’ Journal of I[4] How-Lung Eng and Kui-Kuung Ma (2000),‘Noise Adaptive Soft-Switching Median Filter for Image Denoising’, IEEE Transactions on Acoustics, speech and signal processing Vol.6,pp.2175-2178. [5] Tao Chen, Kai-Kuang Ma, and Li-Hui Chen,(1999),’Tristate median filter for image denoising’,IEEE transactions on Image processing ,Vol. 8,No.12. [6] J. Astola and P. Kuosmanen (1997), Fundamentals of Nonlinear Digital Filtering. Boca Raton, FL: CRCnformation Science and Engineering 22, pp. 189-198.

NCVCCC’08

135

Abstract-In this whirlpool world, conveying highly confidential information secretly has become one of the important aspects of living. With the increasing distance, it has enunciated the coverage with communication through computer technology having made simple, nowadays. Ex for hidden communication is STEGANOGRAPHY. The outdated trend of information hiding (secrets) behind the text, is now hiding the secrets behind the clutter images. The change of appearance of pictures smells fragrant rather than changing its features. This paper, combines 2 processes. The simple OBJECT IMAGE is used for the Steganography process that is done based on F5 algorithm. The prepared Stegoimages are placed on the BACKGROUND IMAGE that is COLLAGE STEGANOGRAPHY. Here the patchwork is done by changing the type of each object as well as its location. The increased number of images leads to increased amount of info hiding. Keywords : Steganography, Collage - Patchwork, Information hiding, Package, Stego image, Steganalysis, suspect.

I.INTRODUCTION

The word Steganography is a Greek word that means ‘writing in hiding’. The main purpose is to hide data in a cover media so that others will not notice it. Steganography has found use in military, diplomatic, personal & intellectual applications. This is a major distinction of this method with the other methods is, for ex, in cryptography, individuals see the encoded data and notice that such data exists but they cannot comprehend it. However, in steganography, individuals will not notice at all that data exists in the sources. Most steganography jobs have been performed on images , video clips , text , music and sound.

Among the methods of steganography, the most common one is to use images. In these methods, features such as pixels of image are changed in order to hide the information so as not to be identifiable by human users and the changes applied on the image are not tangible. Methods of steganography in images are usually applied on the structural features of the image. The ways to identify steganography images and how to extract such information have been usually discovered while so far no method has been used for applying steganography in the appearance of images.

This paper provides a new method for steganography in image by applying changes on the appearance of the image and putting relevant pictures on a background. Then, depending on the location, mode data is hidden. The usage of F5 algorithm is used for the preparation of Stego image. The process performed in this

F5 algorithm is different that uses subtraction and matrix encoding to embed data into the (DCT) coefficients.

II.PROPOSED SYSTEM

1. Select the image for Stego image preparation. 2. Generate the Stego image using F5 Algorithm. 3. Select the Back Ground image. 4. Embed the stego image in the back ground image-

Collage Steganography. 5. Finally extract it.

A COMPLETE SYSTEM DESIGN

SENDER SIDE :

RECEIVER SIDE :

STEGO IMAGE PREPARATION USUAL METHOD ( USING LSB ) :

LSB in BMP – A BMP is capable of hiding quite a large msg, but the fact that more bits are altered results in a larger possibility that the altered bits can be seen with the human eye. i.e., it creates suspicion when transmitted between parties. Suggested applications: LSB in BMP is most suitable for applications where the focus is on the amount of information to be transmitted and not on the secrecy. LSB in GIF – GIF images are especially vulnerable to statistical – or visual attacks – since the palette processing that has to be done leaves a very definite signature on the image. This approach is dependent on the file format as well as the image itself. Suggested applications: LSB in GIF is a very efficient algorithm to use when embedding data in a grayscale image.

Confidentiality in Composition of Clutter Images

G.Ignisha Rajathi, M.E -II Year , Ms.S.Jeya Shobana, M.E.,Lecturer, Department Of CSE, Department Of CSE,

Francis Xavier Engg College , TVL. Francis Xavier Engg College , TVL [email protected] [email protected]

NCVCCC’08

136

PROPOSED METHOD(USING F5):

The new method evolved with more functionalities to over ride all the other techniques was evolved as the F5 Steganographic Algorithm. It provides more security & it is found to be more efficient in its performance to be an ideal one among various other techniques for preparing Stego-image.

“F5” term represents 5 functions

1. Discrete Cosine Transformation 2. Quantization(Quality) 3. Permutation(Password-Driven) 4. Embedding Function (Matrix Encoding) 5. Huffman Encoding.

SYSTEM FLOW DIAGRAM – EMBEDDER ( F5 ALGORITHM ) :

SIZE VERIFICATION(STEGO IMG):

While choosing the object image, it has to be noted that the size of the text file should always be less than the size of the object image, to fit in the object image for hiding process. For ex, the size of the text file of 1KB can easily fit into the object image of 147 x 201 JPEG size.

F5 ALGORITHM :

Input: msg,shared secret,cover img Output: stego image initialize PRNG with shared secret permutate DCT co-eff with PRNG determine k from image capacity calculate code word length n2k – 1 while data left to embed do

get next k-bit message block repeat Gn non-zero AC co-eff sk-bit hash f of LSB in G ss⊕k-bit message block if s 0 then

decrement absolute value of DCT coefficient Gs insert Gs into stego img

end if until s = 0 or Gs 0 insert DCT coefficients from G into stego image

end while

ALGORITHM EXPLANATION :

The F5 Algorithm is used as follows: Instead of replacing the LSB of a DCT coefficient with message data, F5 decrements its absolute value in a process called matrix encoding. So, there is no coupling of any fixed pair of DCT coefficients.

Matrix encoding computes an appropriate (1, (2k – 1), k) Hamming code by calculating the message block size k from the message length and the number of nonzero non-DC co-eff. The Hamming code (1, 2k– 1, k) encodes a k-bit message word ‘m’, into an n-bit code word ‘a’, with n = 2k – 1.F5 uses the decoding function f(a) = ni=1 ai.i and the Hamming distance d.

In other words, we can find a suitable code word a for every code word a and every message word m so that m = f(a) and d(a, a) 1. Given a code word a and message word m, we calculate the difference s = m⊕f(a) and get the new code word as

First, the DCT coefficients are permutated by a keyed pseudo-random number generator (PRNG), then arranged into groups of ‘n’ while skipping zeros and DC coefficients. The message is split into k-bit blocks. For every message block ‘m’, we get an n-bit code word ‘a’ by concatenating the least significant bit of the current coefficients’ absolute value. If the message block ‘m’ and the decoding f(a) are the same, the message block can be embedded without any changes; otherwise, we use s = m⊕f(a) to determine which coefficient needs to change. If the coefficient becomes zero, shrinkage happens, and it is discarded from the coefficient group. The group is filled with the next nonzero coefficient and the process repeats until the message can be embedded. For smaller messages, matrix encoding lets F5 reduce the number of changes to the image—for ex, for k= 3, every change embeds 3.43 message bits while the total code size more than doubles. Because F5 decrements DCT coefficients, the sum of adjacent coefficients is no longer invariant. Steganographic interpretation – Positive coefficients: LSB – Negative coefficients: inverted LSB Skip 0, adjust coefficients to message bit – Decrement positive coefficients – Increment negative coefficients – Repeat if shrinkage occurs [Straddling]Permutation equalizes the change density Scatters the message more uniformly – Key-driven distance schemes – Parity block schemes

NCVCCC’08

137

Independent of message length [Matrix Encoding]Embed k bits by changing one of n = 2 k–1 places: change embedding k n density rate efficiency 1 1 50 % 100 % 2 2 3 25 % 66.7 % 2.7 3 7 12.5 % 42.9 % 3.4 4 15 6.25 % 26.7 % 4.3 k n > k

This Stego image preparation moves on to the next step of

collage preparation.

COLLAGE STEGANOGRAPHY BRIEF INTRODUCTION : The stego image prepared out of the object image , that holds the secret text , should be placed on the appropriate locations in the background image that is COLLAGE STEGANOGRAPHY. Usually the different modes of the relevant object images & locations are held on a separate file in db & sent as a key to the sender & the receiver. SIZE VERIFICATION (BACKGROUND IMAGE): While choosing the background image, note that the size of the object image should always be less than the size of the background image, so that the stego image can fit into the background image. For ex, the size of the object image of 147 x 201 JPEG can easily fit into the background image of size 800 x 600 JPEG. The Starting Location points are specified, that has to be checked with its full size on X & Y-axis to fit in the background. For ex, Consider the size of the background as 800 x 600 and of the object image as 147 x 201, then the starting points specified for the placement of the object image should be always less than (673, 399) . i.e., 800-147 = 673 and 600-201 = 399. COLLAGE PROCESS: Here first a picture is selected as the background image. For ex, the picture of an airport runway. Then the images of a number of appropriate objects are selected with the background, for ex birds in the sky, an airplane on the round and a guide car. For each of these objects various types are selected, for ex for the airplane several types of airplane such as training airplanes, passenger airplanes, military airplanes and jet airplanes. THEORETICAL LOCATION : Each of the selected objects can be placed only in a certain area. For instance, if a background scene is 480*680 pixels, the permissible area for the position of the airplane image with dimensions of 100*200 can be a rectangular area with apexes [(0,0), (0,600), (600,400), (0,400)] to the rectangular area with apexes [(20,50), (620,50), (620,450), (20,450)], with disp up to 50 pixels to the right & 20 pixels to the bottom. MODE SPECIFICATION : In view of the above factors (existing object, type of object and location of object), one can create pictures in different positions. For ex, for the object of airplane in the above ex, there are 4 types of airplanes (training, passenger, military or jet airplane) and 1,000 different positions (20*50=1,000), which be 4,000 modes. There are two other objects (bird and car), which has 2000 different modes. In this picture the number of modes = 16*109. (4000*2000*2000=16*109).

To determine the type and location of each object, first convert the input text to arrays of bits. Now calculate the number of possible modes for the first object, e.g. there were 4,000 modes for the airplane. Calculate power(2), i.e,< the number of modes. Equal to the number of this power, we read the bit input array. For ex, the closest power of 2 to number 4,000 is equal to 211=2048. Then we read 11 bits of the input array. For ex, if the 11 obtained bits are 00001100010, the number is 98. Now, to find the location & type of object divide the obtained number by the number of modes of the object. For ex, divide 98 by 4 (types), the remainder is 2. The airplane type is military. HORIZONTAL & VERTICAL DISP: Now, we divide the quotient of this division by the number of possible columns in which the object can be displaced. For ex, here we divided 24( which is the quotient of division of 98 by 4) by 20. The remainder shows the amount for displacement in horizontal direction and quotient shows the amount for displacement in vertical direction. For ex, for the airplane, we have: Horizontal disp: 24%20 = 4 Vertical disp: 24/20 = 1 By adding these two quantities with the primary location of the picture, the image location is determined. For ex, for the airplane, they are Horizontal position: 600+4=604 Vertical position: 400+1=401 Thus, the type and location of the other objects are also found. Now, the image is sent along with the key file(object name, object types, object location & object displacement).This is collage steganography. EXTRACTION

STEGO-IMAGE EXTRACTION (FROM BACKGROUND IMAGE) :

While extracting information, the program relating to the key file, finds the type and location of the object. For ex, from the rectangular area with apexes [(0,0), (600,0), (600,400), (0,400)] to the rectangular area with apexes [(20,50), (620,50), (620,450), (20,450)], it searches airplanes of diff types. Now, considering the type and location of the object, we find the figure of this mode. SECRET MESSAGE EXTRACTION (FROM STEGO IMAGE) : Finally, the suspect and the key file in hand, we carry out the inverse of the actions performed in stego image preparation using F5 algorithm for all objects and putting in the corresponding bits next to each other the information hidden in the image is obtained from package.

EXPERIMENTAL RESULTS EMBEDDING : The project implementation is in Java. The steganography program uses the key file, to load information relating to the background image and the objects in the image. The objname, the x and y coordinate positions are provided, in which the types of the object are under the name imageXX.png(objtype). The pictures are in JPEG format. Finally, there are displacement of the picture in horizontal and vertical directions. Here, we selected the picture of a house as the bkgnd & putting in 3 objects (car, animal & human). For each of the objects, we considered 4 different types. The text

NCVCCC’08

138

for hiding in the image is prepared. Then, the appropriate type of the object and its location are calculated acc. to the input information and it is placed on the background image & saved as JPEG. EXTRACTING : The object image sizes should be proportionate to bckgnd image size. The decoder loads key file. Then the collage-stego image, is received from the user. According to the key file & the algorithm, it finds type & location of each object in the image and calculates corresponding number of each object & extracting the bits hidden in the image. Then, by placing the extracted bits besides each other, the hidden text was extracted and shown to the user.

CONCLUSION It is changing the appearance of the image by using the displacement of various coordinated objects rather than changing the features.

F5 algorithm for Stego image preparation is more suitable for large scale of data and for color images. Creating a large bank of interrelated images, one can hide a large size of information in the image.

ADVANTAGES

Applied for color, grayscale & binary images. Hide – Print – Scan – Extract. No change in features of background image as only the

appearance is changed. Collage stego image –overall can’t be detected. Increased number of appropriate objects – Increased

storage of messages

BIBLIOGRAPHY 1. Niels Provos and Peter Honeyman, "Hide and Seek: An

Introduction to Steganography," Security & Privacy Magazine, May/June 2003, pp. 32-44.

2. Mohammad Shirali-Shahreza and Sajad Shirali-Shahreza,”Collage Steganography”, Computer Science Department, Sharif University of Technology, Tehran, IRAN.

NCVCCC’08

139

VHDL Implementation of Lifting Based Discrete Wavelet Transform M.Arun Kumar1, C.Thiruvenkatesan2

1: M.Arun Kumar, II yr M.E.(Applied Electronics) E-mail:[email protected] SSN College of Engineering,

2: Mr.C.Thriuvenkatesan, Asst. Professor, SSN College of Engineering, Chennai

Abstract: Wavelet Transform has been successfully applied in different fields, ranging from pure mathematics to applied science. Software implementation of the Discrete Wavelet Transform (DWT), however greatly flexible, appears to be the performance bottleneck in real-time systems. Hardware implementation, in contrast, offers a high performance but is poor in flexibility. A compromise between these two is reconfigurable hardware. For 1- D DWT, the architectures are mainly convolution-based and lifting-based. On the other hand direct and line-based methods are the most possible implementations for the 2-D DWT. The lifting scheme to construct VLSI architectures for DWT outperforms the convolution based architectures in many aspects such as fewer arithmetic operations, in-place implementation and easy management of boundary extension. But the critical path of the lifting based architectures is potentially longer than that of the convolution based ones and this can be reduced by employing pipelining in the lifting based architecture. The lifting based architecture. 1-D and 2-D DWT using lifting scheme have been obtained for signals and images respectively through MATLAB simulation. The Liftpack algorithm for calculating the DWT has been implemented using ‘VHDL’ language. The Lifting algorithm for 1-D DWT has also been implemented in VHDL.

1. INTRODUCTION

Mathematical transformations are applied to signals to obtain further information from that signal that is not readily available in the raw signal. Most of the signals in practice are time-domain signals (time-amplitude representation) in their raw format. This representation is not always the best representation of the signal for most signal processing related applications. In many cases, the most distinguished information is hidden in the frequency content (frequency spectrum) of the signal. Often times, the information that cannot be readily seen in the time-domain can be seen in the frequency domain. Fourier Transform (FT) is a reversible transform, that is, it converts time-domain signal into frequency-domain signal and vice-versa. However, only either of them is available at any given time. That is, no frequency information is available in the time-domain signal, and no time information is available in the Fourier transformed signal. Wavelet Transform (WT) addresses this issue by providing time-frequency representation of a signal or an image. The objectives proposed in the thesis are

1. To implement the 1-D and 2-D Lifting Wavelet Transform (LWT) in MATLAB to understand the concept of lifting scheme.

2. To develop the lifting algorithm for 1-D and 2-D DWT using C language.

3. To implement the 1-D LWT in VHDL using prediction and updating scheme

4. To implement the 5/3 wavelet filter using lifting scheme.

Lifting Scheme Advantages

The lifting scheme is a new method for constructing biorthogonal wavelets. This way lifting can be used to construct second-generation wavelets; wavelets that are not necessarily translate and dilate of one function. Compared with first generation wavelets, the lifting scheme has the following advantages:

• Lifting leads to a speedup when compared to the classic implementation. Classical wavelet transform has a complexity of order n, where n is the number of samples. For long filters, Lifting Scheme speeds up the transform with another factor of two. Hence it is also referred to as fast lifting wavelet transform (FLWT).

• All operations within lifting scheme can be done entirely parallel while the only sequential part is the order of lifting operations.

Secondly, the lifting scheme can be used in situations where no Fourier transform is available. Typical examples include Wavelets on bounded domains, Wavelets on curves and surfaces, weighted wavelets, and Wavelets and irregular sampling.

II. Lifting Algorithm

The basic idea behind the Lifting Scheme is very simple that is to use the correlation in the data to remove redundancy. To this end, first the data is split into two sets (Split phase): the odd samples and the even samples (Figure 2). If the samples are indexed beginning with 0 (the first sample is the 0th sample), the even set comprises all the samples with an even index and the odd set contains all the samples with an odd index. Because of the assumed smoothness of the data, it is predicted that the odd samples have a value that is closely related to their neighboring even samples. N even samples are used to predict the value of a neighboring odd value (Predict phase). With a good prediction method, the chance is high

NCVCCC’08

140

that the original odd sample is in the same range as its prediction. The difference between the odd sample and its prediction is calculated and this is used to replace the odd sample. As long as the signal is highly correlated, the newly calculated odd samples will be on the average smaller than the original one and can be represented with fewer bits. The odd half of the signal is now transformed. To transform the other half, we will have to apply the predict step on the even half as well. Because the even half is merely a sub-sampled version of the original signal, it has lost some properties that are to be preserved. In case of images for instance, the intensity (mean of the samples) is likely kept as constant throughout different levels. The

third step (Update phase) updates the even samples using the newly calculated odd samples such that the desired property is preserved. These three steps are repeated on the even samples and each time half of the even samples are transformed, until all samples are transformed. These three steps are explained in the following section in more detail.

Figure.1 Predict and update stages

III.THE INVERSE TRANSFORM

One of the great advantages of the lifting scheme realization of a wavelet transform is that it decomposes the wavelet filters into extremely simple elementary steps, and each of these

Figure2 Multiple levels of decomposition

steps is easily invertible. As a result, the inverse wavelet transform can always be obtained immediately from the forward transform. The inversion rules are trivial: revert the order of the operations, invert the signs in the lifting steps, and replace the splitting step by a merging step. The block diagram for the inverse lifting scheme is shown in Figure.3.

Here follows a summary of the steps to be taken for both forward and inverse transform.

Figure.3 The lifting Scheme, inverse transform: Update, Predict and Merge stages

IV.RESULTS

1-D LIFTING SCHEME USING MATLAB

This section explains the method to calculate the lifting based 1-D DWT using MATLAB.

Figure .4.The input signal with noise signals

0 1000 2000 3000 4000100

150

200

250

300

350

400

450

500

550Approximation A1

Sampling instant n-->

Am

plitu

de

0 1000 2000 3000 4000-25

-20

-15

-10

-5

0

5

10

15

20

25Detail D1


Am

plitu

de

Figure.5. The Approximation and Detail

2-D LIFTING SCHEME USING MATLAB

0 500 1000 1500 2000 2500 3000 3500 4000100

150

200

250

300

350

400

450

500

550


Am

plitu

de

Noisy input signal

NCVCCC’08

141

This section explains the method to calculate the lifting based 2-D DWT using MATLAB.

Figure.6 The cameraman input image

Here the Haar wavelet is used as the mother

wavelet function and using elementary lifting steps lifts it. Then this new lifted wavelet is used to find the wavelet transform of the input signal. This results in two output signals and they are called as approximated signal and detail signals. The approximation represents the low frequency components present in the original input signal. The detail gives high frequency components in the signal and it represents the hidden details in the signal. If we do not get sufficient information in the detail then the approximation is again decomposed into approximation and details. This decomposition occurs until sufficient information about the image is recovered. Finally inverse lifting scheme is performed on the approximation and detail images to reconstruct the original image. If we compare the original (Figure.4) and reconstructed (Figure.5) images they look exactly same and the transform is loss less.

Figure.7.Approximation and detail images

Reconstructed image

Figure.8.Recontructed image

V.CONCLUSION

The lifting scheme to construct VLSI architectures for DWT outperforms the convolution based architectures in many aspects such as fewer arithmetic operations, in-place implementation and easy management of boundary extension. But the critical path of the lifting based architectures is potentially longer than that of the convolution based ones and this can be reduced by employing pipelining in the lifting based architecture. 1-D and 2-D DWT using lifting scheme have been obtained for signals and images respectively through MATLAB simulation.

REFERENCES:

A VLSI Architecture for Lifting-Based Forward

and Inverse Wavelet Transform by Kishore Andra et al IEEE 2002

Flipping Structure: An Efficient VLSI Architecture for Lifting-Based Discrete Wavelet Transform by Chao-Tsung Huang. Et al IEEE 2004

Generic RAM-Based Architectures for Two-Dimensional Discrete Wavelet Transform With Line-Based Method by Chao-Tsung Huang. et al IEEE 2005

Evaluation of design alternatives for the 2-D-discrete wavelet transform by Zervas. N. D, et al.IEEE 2001

Efficient VLSI architectures of lifting-based discrete wavelet transform by systematic design method Huang. C.-T, et al Proc.IEEE 2002.

Lifting factorization-based discrete wavelet transform architecture design Jiang. W, et al.IEEE 2001

Input image

Approximation image Detail image-Horizontal

Detail image-Vertical Detail image-Diagonal

Output image

NCVCCC’08

142

VLSI Design Of Impulse Based Ultra Wideband Receiver For Commercial Applications

G.Srinivasa Raja1, V.Vaithianathan2

1: G.Srinivasa Raja, II yr M.E.(Applied Electronics) E-mail:[email protected] SSN College of Engineering,

Old Mahabalipuram Road, SSN Nagar - 603 110. 2: Mr.V.Vaithianathan, Asst. Professor, SSN College of Engineering, Chennai

Abstract: An Impulse based ultra-wide band (UWB) receiver front end is presented in this paper. The Gaussian modulated pulses of frequency ranges between 3.1-10.6GHz satisfying Federal Communication Commission spectral mask is received through omni directional antenna and fed into the corresponding LNA’s, Filters and detectors. The Low noise amplifiers, filters, detectors are integrated on a single chip and simulated using 0.18m CMOS Technology. All these simulation is done using Tanner EDA tool along with Puff software for supporting filter and amplifier designs.

I.INTRODUCTION

Ultra-Wide Band (UWB) wireless communications offers a radically different approach to wireless communication compared to conventional narrow band systems. Comparing other wireless technologies Ultra wide band has some specific characteristics i.e., High-data-rate communications at shorter distances, improved channel capacity and immunity to interference. All these made UWB useful in military, imaging and vehicular applications. This paper includes the design of Impulse based Ultra Wide band receiver which can be made inbuilt in systems to avoid cable links at shorter distances and works with low power. The receiver has low complexity design due to impulse-based signal (i.e., absence of local oscillator) made it easily portable. Ultra-wideband communications is not a new technology; in fact, it was first employed by Guglielmo Marconi in 1901 to transmit Morse code sequences across the Atlantic Ocean using spark gap radio transmitters. However, the benefit of a large BW and the capability of implementing multi user systems provided by electromagnetic pulses were never considered at that time. Approximately fifty years after Marconi, modern pulse-based transmission gained momentum in military applications in the form of impulse radars.

Ultra-wide band technology based on the Wi-Media standard brings the convenience and mobility of wireless communications to high-speed interconnects in devices throughout the digital home and office. Designed for low-power, short-range, wireless personal area networks, UWB is the leading technology for freeing people from wires, enabling wireless connection of multiple devices for transmission of video, audio and other high-bandwidth data.

Fig. 1 History of UWB

UWB's combination of broader spectrum and lower power improves speed and reduces interference with other wireless spectra. It is used to relay data from a host device to other devices in the immediate area (up to 10 meters, or 30 feet). UWB radio transmissions can legally operate in the range from 3.1 GHz to 10.6 GHz; at limited transmit power of-41dBm/MHz. Consequently, UWB provides dramatic channel capacity at short range that limits interference.

A Signal is said to be UWB if it occupies at least 500MHz of BW and the Fractional Bandwidth Occupies More than 25% of a center frequency. The UWB signal is a time modulated impulse radio signal is seen as a carrier-less base band transmission.

NCVCCC’08

143

1.6 1.9 2.4

Bluetooth, 802.11b Cordless PhonesMicrowave

GP

PC

5

802.11

-41 dBm/Mhz“Part 15 Limit”

UWB Spectrum

Frequency

Emitted Signal Power

10.63.1

Fig. 2. UWB Spectrum

Classification of signals based on their fractional bandwidth:

Narrowband Bf < 1% Wideband 1% < Bf < 20% Ultra-Wideband Bf > 20%

Types of receiver: 1. Impulse Type 2 .MultiCarrier Type Impulse – UWB Pulse of Very Short Duration (Typically few nano seconds) Merits and Demerits: 1. High Resolution in Multipath Reduce Fading

Margins, Low Complexity Implementation. 2. High precise synchronization & power during

the brief interval increases possibility of interference Multi Carrier – UWB Single data stream is split into multiple data streams of reduced rate, with Each Stream Transmitted into Separate frequency. (Sub carrier) Sub carriers must be properly spaced so that they do not interfere. Merits and Demerits: 1. Well suited for avoiding interference because its carrier

frequency can be precisely chosen to avoid narrowband interference.

2. Front-end design can be challenging due to variation in power.

3. High speed FFT is needed.

Table.1 Comparison of wireless technologies

Fig 3.Theoritical data rate over ranges

Applications:

1. Military. 2. Indoor Applications (Such as WPAN (Wide Personal

Area Network) 3. Outdoor (substantial) Applications but with very

low data rates. 4. High-data-rate communications, multimedia applications,

and cable replacement. Impulse: Radio technology that modulates impulse based waveforms instead of continuous carrier waves.

Pulse Types: 1.Gaussian First derivative, Second derivative 2. Gaussian modulated Sinusoidal Pulse.

Fig .4 UWB Time-domain Behavior

NCVCCC’08

144

Fig.5 UWB Frequency-domain Behavior

Fig.6 Impulse Based UWB Receiver

II. RECEIVER ARCHITECTURE

These UWB impulse based receiver consists of impedance matching circuits, LNA, filters and Detectors. Antenna: The purpose of the antenna is to capture the propagating signal of interest. The ideal antenna for the wideband receiver would itself be wideband. That is, the antenna would nominally provide constant impedance over the bandwidth. This would facilitate efficient power transfer between the antenna and the preamplifier. However, due to cost and size limitations, wide band antennas are often not practical. Thus, most receivers are limited to simple antennas, such as the dipole, monopole, or variants. These antennas are inherently narrowband, exhibiting a wide range of impedances over the bandwidth. For the purpose of antenna-receiver integration, it is useful to model the antenna using the ‘Pi’ equivalent..

Fig.7 Impedance matching circuit

An attenuator circuit allows a known source of power to be reduced by a predetermined factor usually expressed as decibels. A powerful advantage of an attenuator is since it is made from non-inductive resistors, the attenuator is able to change a source or load, which might be reactive, into one, which is precisely known and resistive. The attenuator achieves this power reduction without introducing distortion. The factor K is called the ratio of current, voltage, or power corresponding to a given value of attenuation "A" expressed in decibels. Low Noise Amplifier: The Amplifier has two primary purposes. The first is to interface the antenna Impedance over the band of interest to standard input impedance, such as 50 or 75 Ω. The second purpose of the preamplifier is to provide adequate gain at sufficiently low noise figure to meet system sensitivity requirements. In the VHF low band, a preamplifier may not necessarily need to be a low noise amplifier. An amplifier noise temperature of several hundred Kelvin will usually be acceptable. Without a matching circuit, the input impedance of a wideband amplifier is usually designed to be approximately constant over a wide range of frequencies. As shown in the previous section, the antenna impedance can vary significantly over a wide bandwidth. The resulting impedance mismatch may result in an unacceptable loss of power efficiency between the antenna and the preamplifier. The use of matching networks to overcome this impedance mismatch.

Fig.8 Low Noise Amplifier

Impedance matching is important in LNA design because often times the system performance can be strongly affected by the quality of the termination. For instance, the

R1436

R3

11.6

R2436

NCVCCC’08

145

frequency response of the antenna filter that precedes the LNA will deviate from its normal operation if there are reflections from the LNA back to the filter. Furthermore, undesirable reflections from the LNA back to the antenna must also be avoided. An impedance match is when the reflection coefficient is equal to zero, and occurs when ZS = ZL There is a subtle difference between impedance matching and power matching. As stated in the previous paragraph, the condition for impedance matching occurs when the load impedance is equal to the characteristic impedance. However, the condition for power matching occurs when the load impedance is the complex conjugate of the characteristic impedance. When the impedances are real, the conditions for power matching and impedance matching are equal. For the analysis of LNA design for low noise, the origins of the noise must be identified and understood. The important noise sources in CMOS transistors. Thermal noise is due to the random thermal motion of the carriers in the channel. It is commonly referred to as a white noise source because its power spectral density holds a constant value up to very high frequencies (over 1 THz) Thermal noise is given by

2

2di µ

= 4 K T ( - Q )∆ f L

Induced gate noise is a high frequency noise source that is caused by the non-quasi static effects influencing the power spectral density of the drain current Induced gate noise has a power spectral density given by

2 22g sd

d s 0

ω Ci= 4 K T δ

∆ f 5 g

Noise Figure: Noise figure (NF) is a measure of signal-to-noise ratio (SNR) degradation as the signal traverses the receiver front-end. Mathematically, NF is defined as the ratio of the input SNR to the output SNR of the system.

Total Output Noise Power

Output Noise Power due to SourceNF =

NF may be defined for each block as well as the entire receiver. NFLNA, for instance, determines the inherent noise of the LNA, which is added to the signal through the amplification process. Then the corresponding signal of the Low noise amplifier with amplification is fed into consecutive stages to get the required RF signal. Filter Design: One major technique to combat interference is to filter it out with band pass filters. For most band pass filters, the relevant design parameter consists of the center frequency, the bandwidth (which together with center frequency defines the quality factor Q) and the out-of-band suppression. The Bandwidth of the band selection filter is typically around the band of interest and the center frequency is the center of the band. The Q required is typically high and the center frequency is high as well. On the other hand, the suppression is typically not prohibitive.

It only needs to be large enough to ensure that interference is suppressed to appoint that it does not cause the undesirable effects. To satisfy these specifications, BPF can be implemented using a passive LC Filter. The LC Filter can be combined with the input-matching network of the LNA.A low-pass filter is a filter that passes low-frequency signals but attenuates (reduces the amplitude of) signals with frequencies higher than the cutoff frequency

Fig .9 Band pass and Low pass filter Design

Butterworth filter-3rd order Normalized values: C22=0.6180F C4=2.0000F C19=0.6180F L13=1.6180H L15=1.6180H Square law detector: A square law means that the DC component of diode output is proportional to the square of the AC input voltage. So if you reduce RF input voltage by half, you'll get one quarter as much DC output. Or if you apply ten times as much RF input, you get 100 times as much DC output as you did before. Op-Amp: An operational amplifier, usually referred to as an op-amp for brevity, is a DC-coupled high-gain electronic voltage amplifier with differential inputs and, usually, a single output. In its typical usage, the output of the op-amp is controlled by negative feedback, which largely determines the magnitude of its output voltage gain, input impedance at one of its input terminals and output impedance. Then the output of the op-amp is fed into A/D converter for specific applications. Simulated results:

Tanner EDA simulation for LNA

NCVCCC’08

146

Band Pass Filter Simulation Using Puff

Low Pass Filter Simulation using Puff

III.CONCLUSION

This Impulse based Ultra wide band receiver takes very low power with minimum supply voltage and it is easily portable. Though Various Wireless technologies are being extinct, but UWB have interesting facts. So by utilizing this fact, the receivers are designed and layout using Tanner EDA tool with allowable bandwidth of 7GHz and transmit with power of -41dBm/MHz.

REFERENCES: [1].Journal of VLSI Signal Processing 43, 43–58, 2006 2006 Springer Science + Business Media, LLC. Manufactured in The Netherlands.DOI: 10.1007/s11265-006-7279-x “A VLSI Implementation of Low Power, Low Data Rate UWB Transceiver for Location and Tracking Applications” SAKARI TIURANIEMI, LUCIAN STOICA, ALBERTO RABBACHIN AND IAN OPPERMANN Centre for Wireless Communications, P.O. Box 4500, FIN-90014 University of OULU e-mail: [email protected] [2] Ian D. O’Donnell, Member, IEEE, and Robert W. Brodersen, Fellow, IEEE

“An Ultra-Wide band Transceiver Architecture for Low Power, Low Rate, Wireless Systems” IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 54, NO. 5, SEPTEMBER 2005(Pg: 1623-1631). [3]Jeff Foerster, Intel Architecture Labs, Intel Corp. Evan Green, Intel Architecture Labs, Intel Corp. Srinivasa Somayazulu, Intel Architecture Labs, Intel Corp. David Leeper, Intel Connected Products Division, Intel Corp.” Ultra-Wideband Technology for Short- or Medium-Range Wireless Communications” Intel Technology Journal Q2, 2001. [4]Bonghyuk Park°, Seungsik Lee, Sangsung Choi Electronics and Telecommunication Research Institute [5]Gajeong-dong, Yuseong-gu, Daejeon 305-350, Korea [email protected] “Receiver Block Design for Ultra wideband Applications” 0-7803-9152-7/05/$20.00 © 2005 IEEE (Pg: 1372-1375) [6].Ultra-Wideband Wireless Communications Theory and Applications-Guest EditorialIEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 24, NO. 4, APRIL 2006.

NCVCCC’08

147

Distributed Algorithms for energy efficient Routing in Wireless Sensor Networks

T.Jingo M.S.Godwin Premi .S.Shaji

Department of Electronics & Telecommunications Engineering Sathyabama University

Jeppiaar Nagar, Old Mamallapuram Road, Chennai -600119 [email protected] [email protected] [email protected]

Abstract:- Sensor networks have appeared as a promising technology with various applications, where power efficiency is one of the critical requirements. Each node has a limited battery energy supply and can generate information that needs to be communicated to a sink node. We are assuming that each node in the wireless network has the capacity to transform information in the form of packets and also each node is assumed to be able to dynamically adjust its transmission power depending on the distance over which it transmits a packet. To improve the power efficiency requirements, without affecting the network delay, we propose and study a number of schemes for deletion of obsolete information from the network nodes and we propose distributed algorithms to compute an optimal routing scheme that maximizes the time at which the first node in the network runs out of energy. For computing such a flow we are analyzing a partially distributed algorithm and a completely distributed algorithm. The resulting algorithms have low computational complexity and are guaranteed to converge to an optimal routing scheme that maximizes the lifetime of a network. For reducing the power consumption we are taking source node as dynamically move form one location to the other where it is created and the sensor nodes are static and cannot move form one location to the other location where it is created. The results of our study will allow a network designer to implement such a system and to tune its performance in a delay-tolerant environment with intermittent connectivity, as to ensure with some chosen level of confidence that the information is successfully carried through the mobile network and delivered within some time period.

I. INTRODUCTION

A network of wireless sensor nodes distributed in a region. Each node has a limited battery energy supply and can generate information that needs to be communicating to a sink node. It is assumed that each wireless node has the capability to relay packets. Also each node is power depending on the distance over which it transmits a packet. We focus on the problem of computing a flow that maximizes the lifetime of the network - the lifetime is taken to be the time at which the first node runs out of energy. Since sensor networks need to self configure in many

situations, the goal of this paper is to find algorithms that do this computation in a distributed manner. We analyze partially distributed algorithm and completely distributed algorithm to compute such a flow. The algorithms described can be used in static networks, or in networks in which the topology changes slowly enough such that there is enough time between topology changes to optimally balance the traffic.Energy efficient algorithms for routing in wireless networks have received considerable attention over the past few years. Distributed algorithms to form sparse topologies containing Minimum-energy routes were proposed in “Minimum energy mobile wireless networks [1],” “Minimum energy mobile wireless networks revisited [2].” An approximate approach based on discretization of the coverage region of a node into cones was described in “Distributed topology control for power efficient operation in multi-hop wireless ad hoc networks[3],” “Analysis of a cone-based distributed topology control algorithm for wireless multi-hop networks” [4]. All the above mentioned works focused on minimizing the total energy consumption of the network. However, as pointed out in this can lead to some nodes in the network being drained out of energy very quickly. Hence instead of trying to minimize the total energy consumption, routing to maximize the network lifetime was considered in “Energy conserving routing in wireless ad-hoc networks [5],” “Routing for maximum system lifetime in wireless ad-hoc networks [6].” The problem was formulated as a linear program, and heuristics were proposed to select routes in a distributed manner to maximize the network lifetime. However, as illustrated in these papers, these heuristics do not always lead to selection of routes that are globally optimum and a similar problem formulation for selection of relay nodes was given in “Topology control for wireless sensor networks [7],” We note that distributed iterative algorithms for the computation of the maximum lifetime routing flow were described in “Energy efficient routing in ad hoc disaster recovery networks” [8]. Each-iteration involved a bisection search on the network lifetime, and the solution of a max-flow problem to check the feasibility of the network lifetime. The complexity of the algorithm was shown to be polynomial in the number of nodes in the special case of one source node. We use a different approach based on the sub gradient algorithm for the solution of the dual problem. We exploit the separable nature of the problem using dual decomposition to obtain partially and fully distributed algorithms. This is similar to

NCVCCC’08

148

the dual decomposition approaches applied to other problems in communication networks

When power efficiency is considered, ad hoc networks will require a power-aware metric for their routing algorithms. Typically, there are two main optimization metrics for energy-efficiency broadcast/ multicast routing in wireless ad hoc networks:

(1) Maximizing the network lifetime; and (2) Minimizing the total transmission power

assigned to all nodes. Maximum lifetime broadcast/multicast routing algorithms can distribute packet relaying loads for each node in a manner that prevents nodes from being overused or abused. By maximizing the lifetime of all nodes, the time before the network is partitioned is prolonged.

II. OBJECTIVE

• We reduce the power consumption for packet transmission.

• We achieve maximum lifetime using the partially and fully distributed processing techniques.

III.GENERAL BLOCK DIAGRAM

IV.EXISTING SYSTEM

Power consumption is one of the major drawbacks in the existing system. When a node traverse from one network to another network located within topology, the average end-end delay time is increased because of more number of coordinator nodes present in the topology. By traversing more number of coordinator from the centralized node, battery life is decreased. So network connectivity doesn’t maintain while the sensor node traversing. The sensors collect all the information for which it has been for. The information collected by the sensors will be sent to the nearest sensor

• Existing works focused on minimizing the total energy consumption of the network.

• Nodes in the network being drained out of energy very quickly.

• Energy consumption is high • It is not robust. • The sensors have a limited power so they are not

capable to transform the information to all the other sensors.

• Because of this power consumption network lifetime is low.

V.PROPOSED SYSTEM

In the proposed system the base station can dynamically move from one location to the other for

reducing the power consumption. The problems faced in the existing systems are overcome through the proposed system. Each mobile estimate its life-time based on the traffic volume and battery state. The extension field in route-request RREQ and route reply RREP packets are utilized to carry the life-time (LT) information. LT field is also included into the routing tables. When a RREQ packet is send, LT is set to maximum value (all ones). When an intermediate node receives the RREQ, it compares the LT field of the packet to its own LT. Smallest of the two is set to forwarded RREQ packet. When a node having a path to the destination hears the RREQ packet, it will compare the LT field of the RREQ with the LT field in its routing table and put the smaller of the two into RREP. In case destination hears the RREQ, it will simply send RREP with the lifetime field equal to the LT in the RREQ. All intermediate nodes that hear RREP store the path along with the life time information. In case the source receives several RREPs, it selects the path having the largest LT.

• Unattended operation • Robustness under dynamic operating conditions • Scalability to thousands of sensors • Energy consumption is low • Efficiency is high

VI.OVERVIEW

We describe the system model and formulate the

problem of maximizing the network lifetime as an optimization problem. We are introducing the sub-gradient algorithm to solve a convex optimization problem via the dual problem since the objective function is not strictly convex in the primal variables, the dual function is non-differentiable. Hence, the primal solution is not immediately available, but it can be recovered. We derive the partially and fully distributed algorithms. We describe a way to completely decentralize the problem by introducing additional variables corresponding to an upper bound on the inverse lifetime of each node. The problem of maximizing the network lifetime can be reformulated as the following convex quadratic optimization problem. The flow conservation violation is normalized with respect to the total flow in the network and the minimum node lifetime is normalized with respect to the optimal value of the network lifetime given by a centralized solution to the problem. We considered the network lifetime to be time at which the first node runs out of energy. Thus we assumed that all nodes are of equal importance and critical to the operation of the sensor network. However for a heterogeneous wireless sensor network, some nodes may be more important than others. Also, if there are two nodes collecting highly correlated data, the network can remain functional even if one node runs out of energy. Moreover, for the case of nodes with highly correlated data, we may want only one node to forward the data at a given time. Thus we can activate the two nodes in succession, and still be able to send the necessary data to the sink. We will model the lifetime of a network to be a function of the times for which the nodes in the network can forward their data to the sink node. In order to state this precisely, we redefine the node lifetime and the network lifetime for the analysis in this section. We will

NCVCCC’08

149

relax the constraint on the maximum flow over a link at a given time. We also describe various extensions to the problem for which we can obtain distributed algorithms using the approach described in this paper. We extend the simplistic definition of network lifetime to more general definitions which model more realistic scenarios in sensor networks.

VII. MODULES

1. Node Creation and Plotting process 2. Lifetime Estimation and path tracing 3. Partially Distributed Processing 4. Fully Distributed Processing 5. Data Passing

VIII.ALGORITHM USED

A. Partially Distributed Processing:

• Each mobile estimate its life-time based on the

traffic volume and battery state. • The extension field in route-request RREQ and

route reply RREP packets are utilized to carry the life-time (LT) information.

• When a RREQ packet is send, LT is set to maximum value.

• When an intermediate node receives the RREQ, it compares the LT field of the packet to its own LT. Smallest of the two is set to forwarded RREQ packet.

• When a node having a path to the destination hears the RREQ packet, it will compare the LT field of the RREQ with the LT field and put the smallest of the two into RREP. In case destination hears the RREQ, it will simply send RREP with the lifetime field equal to the LT in the RREQ.

• All intermediate nodes that hear RREP store the path along with the life time information.

• In case the source receives several RREPs, it select the path having the largest LT.

B.Fully Distributed Algorithm The distributed network and ad hoc networks makes resource allocation strategies very challenging since there is no central node to monitor and coordinate the activities of all the nodes in the network. Since a single node cannot be delegated to act as a centralized authority because of limitations in the transmission range, several delegated nodes may coordinate the activities in certain zones. This methodology is generally referred to as clustering and the nodes are called clusterheads.The clusterheads employ centralized algorithms in its cluster; however, the clusterheads themselves are distributive in nature. A first consideration is that the requirement for sensor networks to be self-organizing implies that there is no fine control over the placement of the sensor nodes when the network is installed (e.g., when nodes are dropped from an airplane). Consequently, we assume that nodes are randomly distributed across the environment. • We first put all the nodes in vulnerable state • If there is a face which is not covered by any other

active or vulnerable sensor, then go to active state and inform neighbors.

• If all its faces are covered by one of two types of sensors: active or vulnerable sensors with a larger energy supply, i.e., the sensor is not a champion for any of its faces, then go to idle state and inform neighbors

• After sensor node goes to Active state, it will stay in

Active state for predefined time called Reshuffle-triggering threshold value.

NCVCCC’08

150

• Upon reaching the threshold value, node in Active state will go to Vulnerable state and inform the neighbors.

• If sensor node is in Idle or Active state then it will go in vulnerable state, if one of its neighbor goes into Vulnerable state.

• It causes global reshuffle and it will find new minimal sensor cover.

X.LITERATURE SURVEY

There are two major techniques for maximizing the routing lifetime: the use of energy efficient routing and the introduction of sleep/active modes for sensors. Extensive research has been done on energy efficient data gathering and information dissemination in sensor networks. Some well-known energy efficient protocols were developed, such as Directed Diffusion [9], LEACH [10], PEGASIS [11], and ACQUIRE [12]. Directed Diffusion is regarded as an improvement over the SPIN [13] protocol that used a proactive approach for information dissemination. LEACH organizes sensor nodes into clusters to fuse data before transmitting to the BS. PEGASIS improved the LEACH by considering both metrics of energy consumption and data-gathering delay. In [14], an analytical model was proposed to find the upper bound of the lifetime of a sensor network, given the surveillance region and a BS, the number of sensor nodes deployed and initial energy of each node. Some routing schemes for maximizing network lifetime were presented in [15]. In [16], an analytic model was proposed to analyze the tradeoff between the energy cost for each node to probe its neighbors and the routing accuracy in geographic routing, and a localized method was proposed. In [17] and [8], linear programming (LP) formulation was used to find energy-efficient routes from sensor nodes to the BS,and approximation algorithms were proposed to solve the LP formulation. Another important technique used to prolong the lifetime of sensor networks is the introduction of switch on/off modes for sensor nodes. J. Carle et al. did a good survey in [18] on energy efficient area monitoring for sensor networks. They pointed out that the best method for conserving energy is to turn off as many sensors as possible, while still keeping the system functioning. An analytical model was proposed in [19] to analyze the system performance, such as network capacity and data delivery delay, against the sensor dynamics in on/off modes. A node scheduling scheme was developed in [20]. This scheme schedules the nodes to turn on or off without affecting the overall service provided. A node decides to turn off when it discovers that its neighbors can help it to monitor its monitoring area. The scheduling scheme works in a localized fashion where nodes make decisions based on its local information. Similar to [21], the work in [22] defined a criterion for sensor nodes to turn themselves off in surveillance systems. A node can turn itself off if its monitoring area is the smallest among all its neighbors and its neighbors will become responsible for that area. This process continues until the surveillance area of a node is smaller than a given threshold. A deployment of a wireless sensor network in the real world for habitat monitoring was discussed in [23].

A network consisting of 32 nodes was deployed on a small island to monitor the habitat environment. Several energy conservation methods were adopted, including the use of sleep mode, energy-efficient communication protocols, and heterogeneous transmission power for different types of nodes. We use both of the above-mentioned techniques to maximize the network lifetime in our solution. We find the optimal schedule to switch on/off sensors to watch targets in turn, and we find the optimal routes to forward data from sensor nodes to the BS. The algorithms were derived to solve the dual problems of programs (24) (4) and (8) in a partially and a fully decentralized manner, respectively. The computation results show that the rate of convergence of the fully distributed algorithm was slower than that for the partially distributed algorithm. However, each-iteration of the partially distributed algorithm involves communication between all the nodes and a central node (e.g. sink node). Hence, it is not obvious which algorithm will have a lower total energy consumption cost. If the radius of the network graph is small, then it would be more energy efficient to use the partially distributed algorithm even though each-iteration involves the update of a central variable. Conversely, for large network radius, the fully distributed algorithm would be a better choice. Also, we note that the computation at each node for the fully distributed algorithm involves the solution of a convex quadratic optimization problem. This is in contrast to the partially distributed algorithm, where each-iteration consists of minimization of a quadratic function of a single variable, which can be done analytically. We considered many different extensions to the original problem and showed how the sub gradient approach can be used to obtain distributed algorithms. In addition, we considered a generalization of the definition of network lifetime to model realistic sensor network scenarios, and reformulated the problem as a convex optimization problem with separable structure.

X. SIMULATION RESULTS

XI. CONCLUSION In this project, we proposed two distributed algorithms to calculate an optimal routing flow to maximize the network lifetime. The algorithms were derived to solve the dual problems of programs “Analysis of a cone-based distributed topology control algorithm for wireless multi-hop networks

NCVCCC’08

151

[4],” and “Energy efficient routing in ad hoc disaster recovery networks [8],” in a partially and a fully decentralized manner, respectively. The computation results show that the rate of convergence of the fully distributed algorithm was slower than that for the partially distributed algorithm. However, each-iteration of the partially distributed algorithm involves communication between all the nodes and a central node (e.g. sink node). Hence, it is not obvious which algorithm will have a lower total energy consumption cost. If the radius of the network graph is small, then it would be more energy efficient to use the partially distributed algorithm even though each-iteration involves the update of a central variable. Conversely, for large network radius, the fully distributed algorithm would be a better choice. Also, we note that the computation at each node for the fully distributed algorithm involves the solution of a convex quadratic optimization problem. This is in contrast to the partially distributed algorithm, where each-iteration consists of minimization of a quadratic function of a single variable, which can be done analytically. This communication paradigm has a broad range of applications, such as in the area of telemetry collection and sensor networks. It could be used for animal tracking systems, for medical applications with small sensors to propagate information from one part of the body to another or to an external machine, and to relay traffic or accident information to the public through the vehicles themselves as well as many other applications.

REFERENCES [1] Rodoplu V. and Meng T. H., “Minimum energy mobile wireless networks,” IEEE J. Select. Areas Communi., vol. 17, no. 8, pp. 1333–1344, 1999. [2] L. Li and J. Y. Halpern, “Minimum energy mobile wireless networks revisited,” IEEE International Conference on Communications (ICC), 2001. [3] R. Wattenhofer et al., “Distributed topology control for power efficient operation in multihop wireless ad hoc networks,” IEEE INFOCOM, 2001. [4] L. Li et al., “Analysis of a cone-based distributed topology control algorithm for wireless multi-hop networks,” ACM Symposium on Principle of Distributed Computing (PODC), 2001. [5]J. H. Chang and L. Tassiulas, “Energy conserving routing in wireless ad-hoc networks,” in Proc. IEEE INFOCOM, pp. 22–31, 2000. [6] “Routing for maximum system lifetime in wireless ad-hoc networks,” in Proc. of 37-th Annual Allerton Conference on Communication, Control and Computing, 1999. [7] J. Pan et al., “Topology control for wireless sensor networks,” MobiCom, 2003. [8] G. Zussman and A. Segall, “Energy efficient routing in ad hoc disaster recovery networks,” INFOCOM, 2003. [9] C. Intanagonwiwat, R. Govindan, and D. Estrin, “Directed diffusion: A scalable and robust communication paradigm for sensor networks,” presented at the 6th Annu. ACM/IEEE Int. Conf. Mobile Computing and Networking (MOBICOM), Boston, MA, Aug. 2000. [10] W. Heinzelman, A. Chandrakasan, and H. Balakrishna, “Energy-efficient communication protocol for wireless microsensor networks,” presented at the 33rd Annu. Hawaii

Int. Conf. System Sciences (HICSS-33), Maui, HI, Jan. 2000. [11] Lindsey S., C. Raghavendra, and K. M. Sivalingam, “Data gathering algorithms in sensor networks using energy metrics,” IEEE Trans. Parallel Distrib. Syst., vol. 13, no. 9, pp. 924–935, Sep. 2002. [12] N. Sadagopan and B. Krishnamachari, “ACQUIRE: The acquire mechanism for efficient querying in sensor networks,” in Proc. 1st IEEE Int. Workshop on Sensor Network Protocols and Application (SNPA), 2003, pp. 149–155. [13] W. R. Heinzelman, J. Kulit, and H. Balakrishnan, “Adaptive protocols for information dissemination in wireless sensor networks,” presented at the 5th ACM/IEEE Annu. Int. Conf. Mobile Computing and Networking (MOBICOM), Seattle, WA, Aug. 1999. [14] M. Bhardwaj, T. Garnett, and A. Chandrakasan, “Upper bounds on the lifetime of sensor networks,” in IEEE Int. Conf. Communications, 2001, pp. 785–790. [15] J. Chang and L. Tassiulas, “Maximum lifetime routing in wireless sensor networks,” presented at the Advanced Telecommunications and Information Distribution Research Program (ATIRP’2000), College Park, MD, Mar. 2000. [16] T. Melodia, D. Pompili, and I. F. Akyildiz, “Optimal local topology knowledge for energy efficient geographical routing in sensor networks,” in Proc. IEEE INFOCOM, 2004, pp. 1705–1716. [17] N. Sadagopan and B. Krishnamachari, “Maximizing data extraction in energy-limited sensor networks,” in Proc. IEEE INFOCOM, 2004, pp. 1717–1727. [18] Carle J. and D. Simplot-Ryl, “Energy-efficient area monitoring for sensor networks,” IEEE Computer, vol. 37, no. 2, pp. 40–46, Feb. 2004. [19] Chiasserini C. F. and Garetto M., “Modeling the performance of wireless sensor networks,” in Proc. IEEE INFOCOM, 2004, pp. 220–231. [20] D. Tian and N. D. Georganas, “A coverage-preserving node scheduling scheme for large wireless sensor networks,” in Proc. 1st ACM Int. Workshop on Wireless Sensor Networks and Applications, 2002, pp. 32–41. [21] L. B. Ruiz et al., “Scheduling nodes in wireless sensor networks: A Voronoi approach,” in Proc. 28th IEEE Conf. Local Computer Networks (LCNS2003), Bonn/Konigswinter, Germany, Oct. 2003, pp. 423–429. [22] A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler, and J. Anderson, “Wireless sensor networks for habitat monitoring,” in Proc. 1st ACM Int. Workshop on Wireless Sensor Networks and Applications, Atlanta, Ga, Sep. 2002, pp. 88–97. [23] H. J. Ryser, Combinational Mathematics. Washington, DC: The Mathematical Association of America, 1963, pp. 58–59. [24] R. A. Brualdi and H. J. Ryser, Combinatorial Matrix Theory. Cambridge,

152

Decomposition of EEG Signal Using Source Separation Algorithms

Kiran Samuel PG student Karunya university coimbatore and Shanty Chacko, Lecturer, Department of Electronics & communication Engineering Karunya University [email protected],

[email protected]. Abstract: The objective of my project is to reconstruct brain maps from EEG signal. And from the brain map we will try to diagnosis anatomical, functional and pathological problems. These brain maps are projections of energy of the signals. First we will be going for the deconvolution of the EEG signal into its components. And using some visualization tools we will be able to plot brain maps, which internally show the brain activity. The EEG sample is divided into four-sub bands alpha, beta, theta and delta. This each EEG sub band sample will be having some specified number of components. For extracting these components we do the deconvolution of EEG. The deconvolution of EEG signal can be done using source separation algorithms. There are many algorithms nowadays that can be used for source separation. For doing all these thing we will be using a separate toolbox called EEGLAB. This toolbox is exclusively made for this EEG signal processing and processing. Keywords – EEG signal decomposition, Brain map

1. INTRODUCTION

The EEG data will be first divided into its four-frequency sub band. This is done based on the frequency separation. Electroencephalography is the measurement of the electrical activity of the brain by recording from electrodes placed on the scalp or, in special cases, subdurally or in the cerebral cortex. The resulting traces are known as an electroencephalogram (EEG) and represent a summation of post-synaptic potentials from a large number of neurons. These are sometimes called brainwaves, though this use is discouraged, because the brain does not broadcast electrical waves [1]. Electrical currents are not measured, but rather voltage differences between different parts of the brain. The measured EEG signal will be having there by so many components. So if we are able to decompose this EEG signal into its components first and then do the analysis part then it will be useful. sources

of EEG[1], the more sources we include the more accurate it will be. So our first aim is to decompose the EEG signal into its components. The components in the sense all the more accurate it will become. As a beginning we will start with the reading, measuring and displaying of EEG signal.

II. MEASURING EEG In conventional scalp EEG, the recording is obtained by placing electrodes on the scalp with a conductive gel or paste, usually after preparing the scalp area by light abrasion to reduce impedance due to dead skin cells. The technique has been advanced by the use of carbon nanotubes to penetrate the outer layers of the skin for improved electrical contact. The sensor is known as ENOBIO. Many systems typically use electrodes, each of which is attached to an individual wire. Some systems use caps or nets into which electrodes are imbedded; this is particularly common when high-density arrays of electrodes are needed. Electrode locations and names are specified by the International 10–20 system for most clinical and research applications (except when high-density arrays are used). This system ensures that the naming of electrodes is consistent across laboratories. In most clinical applications, 19 recording electrodes (plus ground and system reference) are used. A smaller number of electrodes are typically used when recording EEG from neonates. Additional electrodes can be added to the standard set-up when a clinical or research application demands increased spatial resolution for a particular area of the brain. High-density arrays (typically via cap or net) can contain up to 256 electrodes more-or-less evenly spaced around the scalp. Even though there are so many ways of taking EEG in most of the cases this 10 – 20 systems are used. So as an example we are going to take the same 10 – 20 system measured one EEG sample and that sample is going to be decomposed Thus measuring of EEG signal is done by different ways in different countries.

153

Fig: 1 Normal EEG wave in time domain

III. EEG SUB BAND:

1. Delta activity around 4Hz.

2. Theta activity between 4 – 8 Hz.

3. Alpha activity between 8 – 14 Hz.

4. Beta activity between 14 and above.

The EEG typically described in terms of (1) rhythmic activity and (2) transients. The rhythmic activity is divided into bands by frequency. To some degree, these frequency bands are a matter of nomenclature but these designations arose because rhythmic activity within a certain frequency range was noted to have a certain distribution over the scalp or a certain biological significance. Most of the cerebral signal observed in the scalp EEG comes falls in the range of 1-40 Hz. This normal EEG signal is passed through a band pass filter so that I can extract these sub bands. The frequency spectrum of the whole EEG signal is also plotted; this was done by taking the fft of the signal. Figure 1 shows all the 32 channels of an EEG sample at a particular time period. The number of channels in an EEG sample may vary. Now we will see some things about four EEG sub bands.

IV. EEGLAB INTRODUCTION

EEGLAB provides an interactive graphic user interface (gui) allowing users to flexibly and interactively process their high-density EEG data.

EEGLAB offers a structured programming environment for storing, accessing, measuring, manipulating and visualizing event-related EEG. The EEGLAB gui is designed to allow non-experienced Matlab users to apply advanced signal processing techniques to their data[4] We will be using two basic filters in this toolbox one is a high pass filter, which will eliminate the entire noise component in the data that means all the data above the frequency of 50 is considered as noise. The other is a band pass filter. The pass band and stop band are selected according to the sub band frequencies. The power can also be estimated using the formula,

.

Where X (m) is the desired signal. After finding out the power for each sub bands we will be going for the plotting of brain map. The power spectrum of the whole sample is also shown in the figure below

Fig: 6-power spectrum of theta wave .

154

Fig: 4 Power spectrums for single EEG channel A. Delta waves

Delta is the frequency range up to 4 Hz. It is seen normally in adults in slow wave sleep. It is also seen normally in babies. It may be seen over focal lesions or diffusely in encephalopathy [6]. Delta waves are also naturally present in stage three and four of sleep (deep sleep) but not in stages 1, 2, and rapid eye movement (REM) of sleep. Finally, delta rhythm can be observed in cases of brain injury and comatic patients.

Fig: 5 Power spectrum of delta wave

B. Theta waves

Theta rhythms are one of several characteristic electroencephalogram waveforms associated with various sleep and wakefulness states. Theta is the frequency range from 4 Hz to 8 Hz. Theta is seen normally in young children. It may be seen in drowsiness or arousal in older children and adults; it can also be seen in meditation. Excess theta for age represents abnormal activity. These rhythms are associated with spatial navigation and some forms of

memory and learning, especially in the temporal lobes. Theta rhythms are very strong in rodent hippocampi and entorhinal cortex during learning and memory retrieval they can equally be seen in cases of focal or generalized subcortical brain damage and epilepsy. C. Alpha waves

Alpha is the frequency range from 8 Hz to 14 Hz. Hans Berger named the first rhythmic EEG activity he saw, the "alpha wave." [6] This is activity in the 8-12 Hz range seen in the posterior head regions when an adult patient is awake but relaxed. It was noted to attenuate with eye opening or mental exertion. This activity is now referred to as "posterior basic rhythm," the "posterior dominant rhythm" or the "posterior alpha rhythm." The posterior basic rhythm is actually slower than 8 Hz in young children (therefore technically in the theta range). In addition to the posterior basic rhythm, there are two other normal alpha rhythms that are typically discussed: the mu rhythm and a temporal "third rhythm." Alpha can be abnormal; for example, an EEG that has diffuse alpha occurring in coma and is not responsive to external stimuli is referred to as "alpha coma."

Fig: 7-power spectrum of alpha wave

. D. Beta waves Beta is the frequency range from 14 Hz to about 40 Hz. Low amplitude beta with multiple and varying frequencies is often associated with active, busy or anxious thinking and active concentration. Rhythmic beta with a dominant set of frequencies is

Fig: 8-power spectrum of beta wave associated with various pathologies and drug effects, especially benzodiazepines. Activity over about 25 Hz

seen in the scalp EEG is rarely cerebral[6]. This is mostly seen in old people and when ever they are trying to relax this activity is seen. This activity will be low in amplitude but will be in a rethamic pattern

155

V. CHANNEL LOCATION

Fig: 9 Channel locations of brain

Channel location shows on what all places the electrodes are placed on the brain. The above figure shows the two-dimensional plot of the brain and the channel locations.

Fig: 10 Channel locations of brain on a 3-d plot

The major algorithms used for deconvolution of EEG signal are ICA and JADE first we are trying to go with ICA. V1. ICA

ICA can deal with an arbitrary high number of dimensions. Let's consider 32 EEG electrodes for instance. The signal recorded in all electrodes at each time point then constitutes a data point in a 32-dimension space. After whitening the data, ICA will "rotate the 128 axis" in order to minimize the Gaussianity of the projection on all axis. The ICA component is the matrix that allows projecting the data in the initial space to one of the axis found by ICA[5]. The weight matrix is the full transformation from the original space. When we write S = W X Where,

• X - Original EEG channels

• S - EEG components.

• W - The weight matrix to go from the S space to the X space.

In EEG: An artifact time course or the time course of the one compact domain in the brain

In EEG: An artifact time course or the time course of the one compact domain in the brain

elec1 elec2 elec3

Component1 [0.824 0.534 0.314 ...]

S = Component 2[0.314 0.154 0.732 ...]

Component 3[0.153 0.734 0.13 ...]

Now we will see how to reproject one component to the electrode space. W-1 is the inverse matrix to go from the source space S to the data space X[2].

X = W-1S As a conclusion, when we talk about independent components, we usually refer to two concepts

• Rows of the S matrix which are the time course of the component activity

• Columns of the W-1 matrix which are the scalp projection of the components

VII. BRAIN ACTIVITY

Brain Mapping is a procedure that records electrical activity within the brain. This gives us the ability to view the dynamic changes taking place throughout the brain during processing tasks and assist in determining which areas of the brain are fully engaged and processing efficiently. The electrical activity of the brain behaves like any other electrical system. Changes in membrane polarization, inhibitory and excitatory postsynaptic potentials, action potentials etc. create voltages that are conducted through the brain tissues. These electrical voltages enter the membranes surrounding the brain and continue up through the skull and appear at the scalp, which can be measured as micro Volts.

Fig: 11 Brain Map for all 32 EEG components

These potentials are recorded by an electrode that is attached to the scalp with non-toxic conductive gel. The electrodes are fed into a sensitive amplifier. At crossroads the EEG is recorded from many electrodes arranged in a particular pattern. Brain Mapping techniques are constantly evolving, and rely on the development and refinement of image acquisition, representation, analysis, visualization and interpretation

156

VIII. CONCLUSION AND FUTURE PLANS

We took 32 channel raw EEG data and found out the power spectrum for the signal. We deconvoluted the EEG sample using ICA algorithm. From the power spectrum with the help of EEGLAB toolbox in MATLAB we plotted brain maps for the sample EEG data. As future works we are planning to split the EEG signal into its components using source separation algorithms other than ICA. And will try to plot the brain maps for each component and thus will compare all the available algorithms.

REFERENCES

[1] S.Saneil, A.R.Leyman”EEG brain map reconstruction using blind source separation”. IEEE signal processing workshop paper. Pages 233-236, august 2001.

[2] Ning T. and Bronzino D., “Autoregressive and bispectral analysis Techniques: EEG applications”, IEEE Engineering in Medicine and Biology Magazine, pages 18-23, March 1990.

[3] Downloaded EEG sample database. http://www.sccn.ucsd.edu/data_base.html,

[4] Downloaded EEGLAB toolbox for MATLAB. http://www.sccn.ucsd.edu/eeglab/install.html,

[5] For acquiring ICA toolbox for MATLAB. http://www.cis.hut.fi/projects/ica/fastica/

[6] Notes on alpha, beta, delta and theta sub bands http://www.wikipedia.org,

157

Segmentation of Multispectral Brain MRI using Source Separation Algorithm

Krishnendu K, PG student and Shanty Chacko, Lecturer, Department of Electronics & communication Engineering, Karunya university, Karunya Nagar, Coimbatore – 641 114, Tamil Nadu, India.

Email addresses: [email protected], [email protected]

Abstract-- The aim of our paper is to implement an algorithm for segmenting multispectral MRI brain images and to check whether there is any performance improvement. One set of multispectral MRI brain image consists of one spin-lattice relaxation time, spin–spin relaxation time, and proton density weighted images (T1, T2, and PD). The algorithm to be used is the ‘source separation algorithm’. Source separation is a more general term used as we can use algorithms like ICA, BINICA, JADE etc.. For implementing the algorithm the first thing needed is the database of multispectral MRI brain images. Sometimes this database is called as the ‘test database’. After the image database is acquired implement the algorithm, calculate the performance parameters and check for performance improvement with respect to any already implemented technique. Keywords – Multispectral MRI, Test Database, Source Separation Algorithm, Segmentation.

I. INTRODUCTION

In image processing field, segmentation [1]

refers to the process of partitioning a digital image into multiple regions (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. The result of image segmentation is a set of regions that collectively cover the entire image. Several general-purpose algorithms and techniques have been developed for image segmentation. Since there is no general solution to the image segmentation problem, these techniques often have to be combined with domain knowledge in order to effectively solve an image segmentation problem for a problem domain. The methods most commonly used are Clustering Methods, Histogram-Based Methods, Region-Growing Methods, Graph Partitioning Methods, Model based Segmentation, Multi-scale Segmentation, Semi-automatic Segmentation and Neural Networks Segmentation.

Some of the practical Medical Imaging applications of image segmentation are:

o Locate tumors and other pathologies o Measure tissue volumes o Diagnosis o Treatment planning o Study of anatomical structure

A. Need for Segmentation The purposes of segmenting magnetic

resonance (MR) images are: 1) to quantify the volume sizes of different tissue types within the body, and 2) to visualize the tissue structures in three dimensions using image fusion. B. Magnetic Resonance Imaging (MRI) Magnetic Resonance Imaging (MRI) is a technique primarily used in medical imaging to demonstrate pathological or other physiological alterations of living tissues. Medical MRI most frequently relies on the relaxation properties of excited hydrogen nuclei in water and lipids. When the object to be imaged is placed in a powerful, uniform magnetic field, the spins of atomic nuclei with a resulting non-zero spin have to arrange in a particular manner with the applied magnetic field according to quantum mechanics. Nuclei of hydrogen atoms (protons) have a simple spin 1/2 and therefore align either parallel or antiparallel to the magnetic field. The MRI scanners used in medicine have a typical magnetic The spin polarization determines the basic MRI signal strength. For protons, it refers to the population difference of the two energy states that are associated with the parallel and antiparallel alignment of the proton spins in the magnetic field. The tissue is then exposed to pulses of electromagnetic energy (RF pulses) in a plane perpendicular to the magnetic field, causing some of the magnetically aligned hydrogen nuclei to assume a temporary non-aligned high-energy state. Or in other words, the steady-state equilibrium established in the static magnetic field becomes perturbed and the population difference of the two energy levels is altered. In order to selectively image different voxels (volume picture elements) of the subject, orthogonal magnetic gradients are applied.

The RF transmission system consists of a RF synthesizer, power amplifier and transmitting coil. This is usually built into the body of the scanner. The power of the transmitter is variable. Magnetic gradients are generated by three orthogonal coils, oriented in the x, y and z directions of the scanner.

158

These are usually resistive electromagnets powered by sophisticated amplifiers which permit rapid and precise adjustments to their field strength and direction. Some time constants are involved in the relaxation processes that establish equilibrium following the RF excitation. These time constants are T1, T2 and PD. In the brain, T1-weighting causes the nerve connections of white matter to appear white, and the congregations of neurons of gray matter to appear gray, while cerebrospinal fluid appears dark. The contrast of "white matter," "gray matter'" and "cerebrospinal fluid" is reversed using T2 or PD imaging.

In clinical practice, MRI is used to distinguish pathologic tissue (such as a brain tumor) from normal tissue. One advantage of an MRI scan is that it is thought to be harmless to the patient. It uses strong magnetic fields and non-ionizing radiation in the radio frequency range. C. Multispectral MR Brain Images

Magnetic resonance imaging (MRI) is an advanced medical imaging technique providing rich information about the human soft tissue anatomy. It has several advantages over other imaging techniques enabling it to provide three-dimensional data with high contrast between soft tissues. A multi-spectral image (fig.1) is a collection of several monochrome images of the same scene, each of them taken with a different sensor. The advantage of using MR images is the multispectral characteristics of MR images with relaxation times (i.e.,T1 and T2) and proton density (i.e., PD) information.

Figure. 1.MR multispectral images T1w (left), T2w (center), and

PDw (right) for one brain axial slice

T1, T2 and PD weighted images depends on two parameters, called sequence parameters, Echo Time (TE) and Repetition Time (TR). Spin Echo sequence is based on repetition of 90° and 180° RF pulses. Spin Echo sequence have two parameters, Echo Time (TE) is the time between the 90° RF pulse and MR signal sampling, corresponding to maximum of echo. The 180° RF pulse is applied at time TE/2 and Repetition Time is the time between 2 excitations pulses (time between two 90° RF pulses). Nearly all MR image display tissue contrasts that depend on proton density, T1 and T2 simultaneously. PD, T1 and T2 weighting will vary with sequence parameters, and may differ between different tissues in the same image. A tissue with a long T1 and T2 (like water) is

dark in the T1-weighted image and bright in the T2-weighted image. A tissue with a short T1 and a long T2 (like fat) is bright in the T1-weighted image and gray in the T2-weighted image. Gadolinium contrast agents reduce T1 and T2 times, resulting in an enhanced signal in the T1-weighted image and a reduced signal in the T2-weighted image. . T1 (Spin-lattice Relaxation Time)

Spin-lattice relaxation time, known as T1, is a time constant in Nuclear Magnetic Resonance and Magnetic Resonance Imaging. T1 characterizes the rate at which the longitudinal Mz component of the magnetization vector recovers. The name spin-lattice relaxation refers to the time it takes for the spins to give the energy they obtained from the RF pulse back to the surrounding lattice in order to restore their equilibrium state. Different tissues have different T1 values. For example, fluids have long T1s (1500-2000 mSec), and water based tissues are in the 400-1200 mSec range, while fat based tissues are in the shorter 100-150 mSec range. T1 weighted images can be obtained by setting short TR (< 750mSec) and TE (< 40mSec) values in conventional Spin Echo sequences.

Fig 2. T1 weighted image

. T2 (Spin-spin Relaxation Time) Spin-spin relaxation time, known as T2, is a

time constant in Nuclear Magnetic Resonance and Magnetic Resonance Imaging. T2 characterizes the rate at which the Mxy component of the magnetization vector decays in the transverse magnetic plane. T2 decay occurs 5 to 10 times more rapidly than T1 recovery, and different tissues have different T2s. For example, fluids have the longest T2s (700-1200 mSec), and water based tissues are in the 40-200 mSec range, while fat based tissues are in the 10-100 mSec range. T2 images in MRI are often thought of as "pathology scans" because collections of abnormal fluid are bright against the darker normal tissue. T2 weighted images can be obtained by setting long TR (>1500 mSec) and TE (> 75mSec) values in conventional Spin Echo sequences. The "pathology weighted" sequence, because most pathology contains

159

more water than normal tissue around it, it is usually brighter on T2. PD (Proton Density)

Proton density denotes the concentration of mobile Hydrogen atoms within a sample of tissue. An image produced by controlling the selection of scan parameters to minimize the effects of T1 and T2, resulting in an image dependent primarily on the density of protons in the imaging volume.

Fig 3. T2 weighted image

Proton density contrast is a quantitative summary of the number of protons per unit tissue. The higher the number of protons in a given unit of tissue, the greater the transverse component of magnetization, and the brighter the signal on the proton density contrast image.

Fig 4. PD weighted image

A T1 weighted image is the image which is usually acquired using short TR (or repetition time of a pulse sequence) and TE (or spin-echo delay time). Similarly, a T2 weighted image is acquired using relatively long TR and TE and a PD weighted image with long TR and short TE. Since the three images are strongly correlated (or spatially registered) over the patient space, the information extracted by means of image processing from the images together is obviously more valuable than that extracted from each image individually. Therefore, tissue segmentation from the three MR images is expected to produce more accurate 3D reconstruction and visualization

than the segmentation obtained from each image individually or from the addition of the three images’ segmentations. Some examples are, 1) Dark on T1, bright on T2, This is a typical pathology. Most cancers have these characteristics. 2) Bright on T1, bright on T2, blood in the brain has these characteristics. 3) Bright on T1, less bright on T2, this usually means the lesion is fatty or contains fat. 4) Dark on T1, dark on T2, chronic blood in the brain has these characteristics.

Following is a table of approximate values of the two relaxation time constants for nonpathological human tissues. Tissue Type

T1 (ms)

T2 (ms)

Cerebrospinal Fluid (similar to pure water)

2300 2000

Gray matter of cerebrum

920 100

White matter of cerebrum

780 90

Blood 1350 200 Fat 240 85 Gadolinium Reduces T1 and T2 times

Table 1. Approximate values of the two relaxation time constants

D. Applications of Segmentation

The classic method of medical image analysis, the inspection of two-dimensional grayscale images, is not sufficient for many applications. When detailed or quantitative information about the appearance, size, or shape of patient anatomy is desired, image segmentation is often the crucial first step. Applications of interest that depend on image segmentation include three-dimensional visualization, volumetric measurement, research into shape representation of anatomy, image-guided surgery, and detection of anatomical changes over time.

II. METHODOLOGY

A. Algorithm

1) Loading T1, T2 and PD images. 2) Converting to double precision format. 3) Converting each image matrix to a row

matrix. 4) Combining three row matrices to form a

matrix. 5) Computing independent components of the

matrix using FastICA algorithm. 6) Separating each rows of the resultant matrix

to three row matrices. 7) Reshaping each row matrix to 256x256.

160

8) Executing dynamic pixel range correction. 9) Converting to unsigned integer format. 10) Plotting the input images and segmented

output images. B. Independent Component Analysis

• Introduction to ICA • Whitening the data • The ICA algorithm • ICA in N dimensions • ICA properties

Introduction to ICA

ICA is a quite powerful technique and is able to separate independent sources linearly mixed in several sensors. For instance, when recording magnetic resonance images (MRI) on the scalp, ICA can separate out artifacts embedded in the data (since they are usually independent of each other). ICA is a technique to separate linearly mixed sources. We used FastICA algorithm for segmenting the images as the code is directly available in World Wide Web. Whitening the data

Some preprocessing steps are performed by most ICA algorithms before actually applying ICA. A first step in many ICA algorithms is to whiten (or sphere) the data. This means that we remove any correlations in the data, i.e. the different channels of say, matrix Q are forced to be uncorrelated. Why we are doing whitening is that it restores the initial "shape" of the data and that then ICA must only rotate the resulting matrix. After doing whitening the variance on both axis is now equal and the correlation of the projection of the data on both axis is 0 (meaning that the covariance matrix is diagonal and that all the diagonal elements are equal). Then applying ICA only means to "rotate" this representation back to the original axis space. The whitening process is simply a linear change of coordinate of the mixed data. Once the ICA solution is found in this "whitened" coordinate frame, we can easily reproject the ICA solution back into the original coordinate frame.

Putting it in mathematical terms, we seek a linear transformation V of the data D such that when P = V*D we now have Cov(P) = I (I being the identity matrix, zeros everywhere and 1s in the Diagonal; Cov being the covariance). It thus means that all the rows of the transformed matrix are uncorrelated.

The ICA algorithm

ICA rotates the whitened matrix back to the original space. It performs the rotation by minimizing the Gaussianity of the data projected on both axes (fixed point ICA). By rotating the axis and minimizing Gaussianity of the projection, ICA is able to recover the original sources which are statistically independent (this property comes from the central limit theorem which states that any linear mixture of 2 independent random variables is more Gaussian than the original variables). ICA in N dimensions

ICA can deal with an arbitrary high number of dimensions. ICA components are the matrix that allows projecting the data in the initial space to one of the axis found by ICA. The weight matrix is the full transformation from the original space. When we write

S = W X, X is the data in the original space, S is the source activity and W is the weight matrix to go from the S space to the X space.

The rows of W are the vector with which we can compute the activity of one independent component. After transformation from S space to the X space we need to reproject each component to the S space. W-1 is the inverse matrix to go from the source space S to the data space X.

X = W-1S If S is a row vector and we multiply it by the

column vector from the inverse matrix above we will obtain the projected activity of one component. All the components forms a matrix. Rows of the S matrix which are the time course of the component activity. ICA properties

• ICA can only separate linearly mixed sources.

• Since ICA is dealing with clouds of point, changing the order in which the points are plotted has virtually no effect on the outcome of the algorithm.

• Changing the channel order has also no effect on the outcome of the algorithm.

• Since ICA separates sources by maximizing their non-Gaussianity, perfect Gaussian sources can not be separated.

• Even when the sources are not independent, ICA finds a space where they are maximally independents.

161

III. RESULT

The acquired MR images are in the DICOM (Digital Imaging and Communications in Medicine, .dcm) single-file format. In order to load it in MATLAB a special command is used. The input T1, T2 and PD images and the corresponding segmented output images are given below (fig.5).

Fig. 5.MR multispectral images T1w (left), T2w (center), and PDw (right) for one brain axial slice and corresponding segmented T1w

(left), T2w (center), and PDw (right) images

IV. CONCLUSION AND FUTURE PLAN

Segmented multispectral MR (T1, T2 and PD) images are obtained using the ICA algorithm. The tissues can be analyzed using the segmented image. For analyzing the tissues, parameters like T1 time and T2 time of each tissue type must be known. As the future work we are planning to extract the brain part only from the image using snake algorithm and will try to segment it using ICA algorithm and our algorithm.

REFERENCES

[1] Umberto Amato, Michele Larobina , Anestis Antoniadis , Bruno Alfano ”Segmentation of magnetic resonance brain images through discriminant analysis”. Journal of Neuroscience Methods 131 (2003) 65–74. [2] “Semi-Automatic Medical Image Segmentation” by Lauren O’Donnell, MASSACHUSETTS INSTITUTE OF TECHNOLOGY, October 2001. [3] Notes about T1, T2, PD and Multispectral MR brain images. http://www.wikipedia.org/ [4] For acquiring multispectral MR brain images. http://lab.ibb.cnr.it/ [5] ICA (Independent Component Analysis) http://www.sccn.ucsd.edu/~arno/indexica.html [6] For acquiring FastICA toolbox for MATLAB. http://www.cis.hut.fi/projects/ica/fastica/

162

MR Brain Tumor Image Segmentation Using Clustering Algorithm

Lincy Annet Abraham1, D.Jude Hemanth2 PG Student of Applied Electronics1, Lecturer2

Department of Electronics & Communication Engineering Karunya University, Coimbatore.

[email protected],[email protected]

Abstract- In this study, unsupervised clustering methods are examined to develop a medical diagnostic system and fuzzy clustering is used to assign patients to the different clusters of brain tumor. We present a novel algorithm for obtaining fuzzy segmentations of images that are subject to multiplicative intensity inhomogeneities, such as magnetic resonance images. The algorithm is formulated by modifying the objective function in the fuzzy algorithm to include a multiplier field, which allows the centroids of each class to vary across the image. Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representation of the original data. The results are compared with the results of clustering according to classification performance. This application shows that fuzzy clustering methods can be important supportive tool for the medical experts in diagnostic. Index Terms- Image segmentation, intensity inhomogeneities, fuzzy clustering, magnetic resonance imaging.

I. INTRODUCTION

ccording to rapid development on medical devices, the traditional manual data analysis has become inefficient and computer-based analysis is indispensable. Statistical methods, fuzzy logic, neural network and machine learning algorithms are being tested on many medical prediction problems to provide a decision support system.

Image segmentation plays an important role in variety of applications such as robot vision, object recognition, and medical imaging. There has been considerable interest recently in the use of fuzzy segmentation methods which retain more information from the original image than hard segmentation methods. The fuzzy c means algorithm (FCM), in particular, can be used to obtain segmentation via fuzzy pixel classification. Unlike hard classification methods which force pixels to belong exclusively to

one class, FCM allows pixels to belong to multiple classes with varying degrees of membership. The approach allows additional flexibility in many applications and has recently been used in the processing of magnetic resonances (MR) images.

In this work, unsupervised clustering methods are to be performed to cluster the patients brain tumor. Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representation of the original data. This study fuzzy c means algorithm is used to separate the tumor from the brain and can be identified in a particular color. Supervised and unsupervised segmentation techniques provide broadly similar results..

II. PROPOSED METHODOLOGY

Figure 1. Block Diagram

Figure 1. shows the proposed methodology of segmentation of images. Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representation of the original data wit three approaches: the literal and approximate fuzzy c means unsupervised clustering algorithms and a supervised computational neural network, a dynamic multilayered perception trained with the cascade correlation learning algorithm. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithm were visually observed to show better segmentat ion when compared wit raw image data for volunteer studies. In computer vision, segmentation refers to the process of partitioning a digital image into multiple regions (sets of pixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to

163

analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images.

The result of image segmentation is a set of regions that collectively cover the entire image, or a set of contours extracted from the image (see edge detection). Each of the pixels in a region are similar with respect to some characteristic or computed property, such as color, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic(s). Some of the practical applications of image segmentation are:

• Medical Imaging o Locate tumors and other pathologies o Measure tissue volumes o Computer-guided surgery o Diagnosis o Treatment planning o Study of anatomical structure

• Locate objects in satellite images (roads, forests, etc.)

• Face recognition • Fingerprint recognition • Automatic traffic controlling systems • Machine vision

III. FUZZY C - MEANS CLUSTERING Fuzzy C-means Clustering (FCM), is also known

as Fuzzy ISODATA, is an clustering technique which is separated from hard k-means that employs hard partitioning. The FCM employs fuzzy partitioning such that a data point can belong to all groups with different membership grades between 0 and 1.

FCM is an iterative algorithm. The aim of FCM is to find cluster centers (centroids) that minimize a dissimilarity function.

To accommodate the introduction of fuzzy partitioning, the membership matrix (U) is randomly initialized according to Equation

∑=

=∀=c

iij nju

1

,...,1,1 (1)

The dissimilarity function which is used in FCM is given Equation

∑∑∑= ==

==c

i

n

jij

mij

c

iic duJcccUJ

1 1

2

121 ),...,,,(

(2)

uij is between 0 and 1; ci is the centroid of cluster i; dij is the Euclidian distance between ith centroid(ci) and jth data point; m [1,] is a weighting exponent.

To reach a minimum of dissimilarity function there are two conditions. These are given in Equation (3) and Equation (4).

∑∑

=

== n

jm

ij

n

j jm

iji

u

xuc

1

1

(3)

∑ =

−

⎟⎟⎠

⎞⎜⎜⎝

⎛=

c

k

m

kj

ij

ij

dd

u

1

)1/(2

1 (4)

3.1 ALGORITHM This algorithm determines the following steps.

Step 1. Randomly initialize the membership matrix (U) that has constraints in Equation (1). Step 2. Calculate centroids (ci) by using Equation (3). Step 3. Compute dissimilarity between centroids and data points using equation (2). Stop if its improvement over previous iteration is below a threshold. Step 4. Compute a new U using Equation (4). Go to Step 2.

FCM does not ensure that it converges to an optimal solution. Because of cluster centers (centroids) are initialize using U that randomly initialized (Equation (3)).

3.2 FLOW CHART Figure 2. shows the systematic procedure of the algorithm and the summation is given above as per follows:

1) Read the input image 2) Set the number of clusters =4 3) Calculate the eulidean distance 4) Randomly initialize membership matrix 5) Calculate the centroids 6) Calculate the membership coefficient 7) If threshold is below 0.01 then update the

membership matrix 8) If threshold above 0.01 then display the

segmentated image. 9) The image is coverted into colour 10) The segmented tumor is displayed in a

particular colour and the rest in another colour

IV. IMPLEMENTATION The set of MR images consist of 256*256 12 bit images. The fuzzy segmentation was done in MATLAB software. There four types of brain tumor used in this study namely astrocytoma, meningioma, glioma, metastase.

164

= Figure 2. Algorithm for identification

Table 1. Types and number of datas

DATA TYPES NUMBER OF IMAGES

Astrocytoma 15 meningioma 25

glioma 20 metastase 10 TOTAL 70

V.EXPERIMENTAL RESULTS Fuzzy c means algorithm is used to assign the patients to different clusters of brain tumor. This application of fuzzy sets in a classification function causes the class membership to become relative one and an object can belong to several classes at the same time but with different degrees. This is important feature for medical diagnostic system to increase the sensitivity. The four types of datas were used. One sample is shown as above. Figure 3. shows the input image of the brain tumor and Figure 4. shows the fuzzy segmentated output image.

Figure 3. Input image

Figure 4. Output imag

Start

Read the input MR images

Set number of cluster

Calculate the Eulidean distance

Randomly initialize membership matrix

Calculate the membership coefficients for each pixel in

each cluster

If threshold<= 0.01

Display the output image

stop

Calculate the centroids

165

Table 2. Segmentation results

Count

20

Threshold

0.0085

Time period

78.500000 seconds

Centroid

67.3115 188.9934 120.2793

13.9793

VI.CONCLUSION

In this study, we use fuzzy c means algorithms to cluster the brain tumor. In medical diagnostic systems, fuzzy c means algorithm gives the better results according to our application. Another important feature of fuzzy c means algorithm is membership function and an object can belong to several classes at same time but with different degrees. This is a useful feature for a medical diagnostic system. At a result, fuzzy clustering method can be important supportive tool for the medical experts in diagnostic. Future work is fuzzy c means result is to be compared with other fuzzy segmentation. Reduced time period of fuzzy segmentation is used in medical.

ACKNOWLEDGMENT

We would like to thank M/S Devaki Scan Center Madurai, Tamil nadu for providing MR brain tumor images of various patients and datas.

REFERENCES

[1]. Songül Albayrak, Fatih Amasyal. “FUZZY C-MEANS CLUSTERING ON MEDICAL DIAGNOSTIC SYSTEMS”. International XII. Turkish Symposium on Artificial Intelligence and Neural Networks - TAINN 2003.

[2]. Coomans, I. Broeckaert, M. Jonckheer, and D.L. Massart: “Comparison of Multivariate Discrimination Techniques for Clinical Data - Application to the Thyroid Functional State”. Methods of Information in Medicine, Vol.22, (1983) 93- 101.

[3]. L. Ozyilmaz, T. Yildirim, “Diagnosis of Thyroid Disease using Artificial Neural Network Methods”, Proceedings of the 9’th International Conference on Neural Information Processing (ICONIP’02) (2002).

[4]. G. Berks, D.G. Keyserlingk, J. Jantzen, M. Dotoli, H. Axer, “Fuzzy Clustering- A Versatile Mean to Explore Medical Database”, ESIT2000, Aachen, Germany.

166

MRI Image Classification Using Orientation Pyramid and Multi resolution Method

R. Catharine Joy, Anita Jones Mary

PG student of Applied Electronics, Lecturer Department of Electronics and Communication Engineering

Karunya University, Coimbatore. [email protected], [email protected]

Abstract--In this paper, a multi-resolution volumetric texture segmentation algorithm is used. Textural measurements were extracted in 3-D data by sub-band filtering with an Orientation Pyramid method. Segmentation is used to detect the objects by dividing the image into regions based on colour, motion, texture etc. Texture relates to the surface or structure of an object and depends on the relation of contiguous elements and may be characterised by granularity or roughness, principal orientation and periodicity. We describe the 2-D and 3-D frequency domain texture feature representation by illustrating and quantitatively comparing results on example 2-D images and 3-D MRI. First, the algorithm was tested with 3-D artificial data and natural textures of human knees will be used to describe the frequency and orientation multi-resolution sub-band filtering. Next, the three magnetic resonance imaging sets of human knees will be used to discriminate anatomical structures that can be used as a starting point for other measurements such as cartilage extraction. Index Terms- Volumetric texture, Texture Classification, sub-band filtering, Multi-resolution.

I.INTRODUCTION

Volumetric texture analysis is highly desirable, like for medical imaging applications such as magnetic resonance imaging segmentation, Ultrasound, or computed tomography where the data provided by the scanners is either intrinsically 3-D or a time series of 2-D images that can be treated as a data volume. Moreover, the segmentation system can be used as a tool to replace the tedious process of manual segmentation. Also we describe a fully texture description using a multi- resolution sub-band filtering. Texture features derived from grey level co-occurrence matrix (GLCM) calculate the joint statistics tics of grey-levels of pairs at varying distances and is a simple and widely used texture feature. Texture analysis has been used with mixed success in MRI, such as for detection of micro calcification in breast imaging and for knee segmentation. Each portion of the cartilage image is segmented and shown clearly. Texture segmentation is to segment an image into regions according to the textures of the regions. The goal is to simplify and change the representation of an image into something that is more meaningful and easier to analyse. For example, the problem of grey-matter white- matter labelling in central nervous system (CNS) images

like MRI head-neck studies has been addressed by supervised statistical classification methods, notably EM-MRF. The segmented portions cannot be seen clearly through 2-D slice image. So we are going for 3-D rendering. Cartilage image also cannot be segmented and viewed clearly. Tessellation or tilling of a plane is a collection of plane figures that fills the plane with no overlaps and no gaps.

In this paper we describe fully a 3-D texture description scheme using a multi-resolution sub-band filtering and to develop a strategy for selecting the most discriminant texture features conditioned on a set of training images. We propose a sub-band filtering scheme for volumetric textures that provide a series of measurements which capture the different textural characteristics of the data. The filtering is performed in the frequency domain with filters that are easy to generate and give powerful results. A multi-resolution classification scheme is then developed which operates on the joint data-feature space within an oct-tree structure. This benefits both the efficiency of the computation and ensures only the certain labelling at a given resolution is propagated to the next. Interfaces between regions (planes), where the label decisions are uncertain, are smoothed by the use of 3-D “butterfly” filters which focus the inter-class labels.

II.LITERATURE SURVEY Texture analysis has been used with mixed success in MRI, such as for detection of micro calcification in breast imaging and for knee segmentation and in CNS imaging to detect macroscopic lesions and microscopic abnormalities such as for quantifying contra lateral differences in epilepsy subjects, to aid the automatic delineation of cerebellar volumes, to estimate effects of age and gender in brain asymmetry and to characterize spinal cord pathology in Multiple Sclerosis. Segmenting the trabecular region of the bone can also be viewed as classifying the pixels I that region, since the boundary is initialized to contain intensity and texture corresponding to trebecular bone, then grows outwards to find the true boundary of that bone region. However, no classification is performed on the rest of the image, and the classification of trabecular bone is performed locally. The concept of image texture is intuitively obvious to us; it can be difficult to provide a satisfactory definition. Texture relates to the surface or structure of an object and depends on the relation of contiguous elements and may be characterized by granularity or rough ness, principal orientation and periodicity.

The principle of sub band filtering can equally be

167

applied to images or volumetric data. Wilson and Spann proposed a set of operations that subdivide the frequency domain of an image into smaller regions by the use of two operators’ quadrant and center-surround. By combining these operators, it is possible to contrast different tessellations of the space, one of which is the orientation pyramid. To visualize the previous distribution, the Bhattacharyya space and its two marginal distributions were obtained for a natural texture image with 16 classes. It is important to mention two aspects of this selection process: the Bhattacharyya space is constructed on training data and the individual Bhattacharyya distances are calculated between pairs of classes. Therefore, there is no guarantee that the feature selected will always improve the classification of the whole data space; the features selected could be mutually redundant or may only improve the classification for a pair of classes but not the overall classification.

III.VOLUMETRIC TEXTURE

Volumetric Texture is considered as the texture that can be found in volumetric data. Texture relates to the surface or structure of an object and depends on the relation of contiguous elements. The other concepts of texture are smoothness, fineness, coarseness, graininess and describe the three different approaches for texture analysis: statistical, structural and spectral. The statistical methods rely on the moments of the grey level histogram; mean, standard deviation, skewness, flatness, etc. According to Sonka that texture is scale dependent, therefore a multi-resolution analysis of an image is required if texture is going to be analysed. Texture analysis has been used with mixed success in MRI, such as for detection of micro calcification in breast imaging and for knee segmentation. Texture segmentation is to segment an image into regions according to the textures of the regions.

IV.SUBBAND FILTERING USING AN ORIENTATION

PYRAMID

The principle of sub band filtering can equally be applied to images or volumetric data. Certain characteristics of signals in the spatial domain such as periodicity are quite distinctive in the frequency or Fourier domain. If the data contain textures that vary in orientation and frequency, then certain filter sub bands will contain more energy than others. Wilson and Spann proposed a set of operations that subdivide the frequency domain of an image into smaller regions by the use of two operators’ quadrant and centre surround. By combining these operators, it is possible to contrast different tessellations of the space, one of which is the orientation pyramid.

Fig: 1 (a, b) 2-D orientation pyramid

(c) 3-D orientation pyramid.

Fig: 2 A graphical example of sub-band filtering.

A.SUBBAND FILTERING

A filter bank is an array of band-pass filters that separates the input signal into several components, each one carrying a single frequency sub band of the original signal. It also is desirable to design the filter bank in such a way that sub bands can be recombined to recover original signal. The first process is called analysis, while the second is called synthesis. The output of analysis is referred as sub band signal with as many sub bands as there are filters in filter bank.

168

Fig: 3 Sub-band filters images of the second orientation pyramid containing 13 sub-band regions of the human knee MRI.

The filter bank serves to isolate different frequency

components in a signal. This is useful because for most applications some frequencies are more important than others. For example these important frequencies can be coded with a fine resolution. Small differences at these frequencies are significant and a coding scheme that preserves these differences must be used. On the other hand, less important frequencies do not have to be exact. A coarser coding scheme can be used, even though some of the finer details will be lost in the coding. B.PYRAMIDS

Pyramids are an example of a multi-resolution representation of the image. Pyramids separate information into frequency bands In the case of images, we can represent high frequency information (textures, etc.) in a finely sampled grid Coarse information can be represented in a coarser grid (lower sampling rate acceptable) Thus, coarse features can be detected in the coarse grid using a small template size This is often referred to as a multi-resolution or multi-scale resolution.

V.MULTIRESOLUTION CLASSIFICATION

A multi-resolution classification strategy can exploit the inherent multi-scale nature of texture and better results can be achieved. The multi- resolution procedure consists of three main stages: climb, decide and descend. The climbing stage represents the decrease in resolution of the data by means of averaging a set of neighbours on one level (children elements or nodes) up to a parent element on the upper level. Two common climbing methods are the Gaussian Pyramid and the Quad tree. The decrease in resolution correspondingly reduces the uncertainty in the elements’ values since they tend toward their mean. In contrast, the positional uncertainty increases at each level. At the highest level, the new reduced space can be classified either in a supervised or unsupervised scheme.

Fig: 4 K-means classification of MR image of a human knee based on frequency and orientation regions.

Once the phase congruency map of an image has

been constructed we know the feature structure of the image. However, thresholding is course, highly subjective, and in the end eliminates much of the important information in the image. Some other method of compressing the feature information needs to be considered, and some way of extracting the non-feature information, or the smooth map of the image, needs to be developed. In the absence of noise, the feature map and the smooth map should comprise the whole image. When noise is present, there will be a third component to any image signal and one that is independent of the other two.

VI.EXPERIMENTAL RESULTS

The 3-D MRI sets of human knees acquired

different protocols, one set with Spin Echo and two sets with SPGR. In the three cases each slice had dimensions of 512 x 512 pixels and 87, 64, and 60 slices respectively. The bones, background, muscle and tissue classes were labelled to provide for evaluation. Four training regions of size 32 x 32 x 32 elements were manually selected for the classes of background, muscle, bone and tissue. These training regions were small relative to the size of the data set, and they remained as part of the test data. Each training sample was filtered with the OP sub-band filtering scheme.

Fig: 5 One slice from a knee MRI data set is filtered with a sub-band filter with a particular frequency.

169

The SPGR (Spoiled Gradient Recalled) MRI data sets were classified and the bone was segmented with the objective of using this as an initial condition for extracting the cartilage of the knee. The cartilage adheres to the condyles of the bones and appears as a bright, curvilinear structure in SPGR MRI data. In order to segment the cartilage out of the MRI sets, two heuristics were used: cartilage appears bright in the SPGR MRIs and cartilage resides in the region between bones. This is translated into two corresponding rules: threshold voxels above a certain Gray level and discard those not close to the region of contact between bones.

Fig:6 The cartilage and one slice of the MRI set.

VII.CONCLUSION

A multi-resolution algorithm is used to view the classified images on segments. A sub-band filtering algorithm for segmentation method was described and tested, first, with artificial and natural textures yielding fairly good results. The algorithm was then used to segment a human knee MRI. The anatomical regions: muscle, bone, tissue and background could be distinguished. Textural measurements were extracted in 3-D data by sub-band filtering with an Orientation Pyramid tessellation method. The algorithm was tested with artificial 3-D images and MRI sets of human knees. Satisfactory classification results were obtained in 3-D at a modest computational cost. In the case of MRI data, M-VTS improve the textural characteristics of the data. The resulting segmentations of bone provide a good starting point. The future enhancement is being enhanced by using Fuzzy clustering.

REFERENCES [1]C. C. Reyes-Aldasoro and A. Bhalerao, “Volumetric texture description and discriminant feature selection for MRI,” in Proc. Information Processing in Medical Imaging, C. Taylor and A. Noble, Eds., Ambleside, U.K., Jul. 2003. [2]W. M.Wells, W. E. L. Grimson, R. Kikinis, and F. A. Jolesz, “Adaptive Segmentation of MRI Data,” IEEE Trans. Med. Imag., vol. 15, no. 4, Aug. 1996. [3]C. Reyes-Aldasoro and A. Bhalerao, “The Bhattacharyya space for feature selection and its application to texture segmentation,” Pattern Recognit., vol. 39, no. 5, pp. 812–826, 2006. [4]G. B. Coleman and H. C. Andrews, “Image Segmentation by Clustering,” Proc. IEEE, vol. 67, no. 5, pp. 773–785, May 1979.. [5]P. J. Burt and E. H. Adelson, “The Laplacian Pyramid as a compact Image Code,” IEEE Trans. Commun., vol. COM-31, no. 4, pp. 532–540, Apr. 1983. [6]V. Gaede and O. Günther, “Multidimensional access methods,” ACM Computing Surveys, vol. 30, no. 2, pp. 170–231, 1998.

170

Dimensionality reduction for Retrieving Medical Images Using PCA and GPCA

.

W Soumya, ME, Applied Electronics, Karunya University, Coimbatore

Abstract— Retrieving images from large and varied collections using image content is a challenging and important problem in medical applications. In this paper, to improve the generalization ability and efficiency of the classification, from the extracted regional features, a feature selection method called principal component analysis is presented to select the most discriminative features. A new feature space reduction method, called Generalized Principal Component Analysis (GPCA), is also presented which works directly with images in their native state, as two-dimensional matrices. In principle, redundant information is removed and relevant information is encoded into feature vectors for efficient medical image retrieval, under limited storage. Experiments on databases of medical images show that, for the same amount of storage, GPCA is superior to PCA in terms of memory requirement, quality of the compressed images, and computational cost.

Index Terms—Dimension reduction, Eigen vectors, image retrieval, principal component analysis. INTRODUCTION

ADVANCES in data storage and image acquisition technologies have enabled the creation of large image datasets. Also, the number of digitally produced medical

images is rising strongly in various medical departments like radiology, cardiology, pathology etc and in clinical decision making process. With this increase has come the need to be able to store, transmit, and query large volumes of image data efficiently. Within the radiology department, mammographies are one of the most frequent application areas with respect to classification and content-based search [7-9]. Within cardiology, CBIR has been used to discover stenosis images [13]. Pathology images have often been proposed for content-based access [12] as the color and texture properties can relatively easy be identified. In this scenario, it is necessary to develop appropriate information systems to efficiently manage these collections [3]. A common operation on image databases is the retrieval of all images that are similar to a query image which is referred to as content-based medical image retrieval. The block diagram for medical imag to the vectors to concentrate relevant information in a small number of only for reasons of computational efficiency but also because it can improve the accuracy of the analysis. The set of techniques that can be employed for dimension reduction can be partitioned in two important ways; they can be separated into techniques that apply to supervised or unsupervised learning and into techniques that either entail feature

selection or feature extraction. Some of the feature space reduction methods include Principal component analysis (PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA), and Canonical Correlation Analysis (CCA). Among these, PCA finds principal components, ICA finds independent components [11], CCA maximize correlation [5], and LDA maximize the interclass variance [10]. PCA is the most well known statistical approach for mapping the original high-dimensional features into low-dimensional ones by eliminating the redundant information from the original feature space [1]. The advantage of the PCA transformation is that it is linear and that any linear correlations present in the data are automatically detected. Then, Generalized Principal Component Analysis (GPCA), which is a novel feature space reduction technique which is superior to PCA, is also presented [2].

Fig. 1. Block diagram of content-based image retrieval.

FEATURE SELECTION METHODS

Principal Component Analysis (PCA) Principal Components Analysis (PCA) which is an

unsupervised feature transformation technique and supervised feature selection strategies such as the use of information gain for feature ranking/selection. Principal component analysis reduces the dimensionality of the search to a basis set of prototype images that best describes the images. Each image is described by its projection on the basis set; a match to a query image is determined by comparing its projection vector on the basis set with that of the images in the database. The reduced dimensions are chosen in a way that captures essential features of the data with very little loss of information.

The idea behind the principal component analysis method is briefly outlined herein: An image can be viewed as a vector by concatenating the rows of the image one after another. If the image has square dimensions (as in MR images) of L x L pixels, then the size of the vector is L2. For typical image dimensions of 124 x 124, the vector length

Query Image

Images in database

Feature selection

Classifier

171

(dimensionality) is 15,376. Each new image has a different vector, and a collection of images will occupy a certain region in an extremely high dimensional space. The task of comparing images in this hundred thousand–dimension space is a formidable one. The medical image vectors are large because they belong to a vector space that is not optimal for image description. However, knowledge of brain anatomy provides us with similarities between these images. It is because of the similarities that we can deduce that image vectors will be located in a small cluster of the entire image space. The optimal system can be computed by the Singular Value Decomposition (SVD).

Dimension reduction is achieved by discarding the lesser principal components. i.e., the idea is to find a more appropriate representation for the image features so that the dimensionality of the space used to represent them can be reduced.

A.1 PCA Implementation The mathematical steps used to determine the principal

components of a training set of medical images are outlined in this paragraph (6): A set of training images n are represented as vectors of length L x L, where L is the number of pixels in the x (y) direction. These pixels may be arranged in the form of a column vector. If the images are of size M x N, there will be total of MN such n-dimensional vectors comprising all pixels in the n images. The mean vector, Mx of a vector population can be approximated by the sample average, K Mx = 1/K ( Xk) (1) k=1 with K=MN. Similarly, the n x n covariance matrix, Cx, of the population can be approximated by K Cx =1/ (K-1) ∑ (Xk - Mx) (Xk - Mx)

T (2)

k=1

where K-1 instead of K is used to obtain an unbiased estimate of Cx from the samples. Because Cx is real and symmetric, finding a set of n ortonormal eigenvectors always is possible. The principal components transform is given by Y=A(X-Mx) (3)

It is not difficult to show that the elements of y are uncorrelated. Thus, the covariance matrix Cy is diagonal. The rows of matrix A are the normalized eigenvectors of Cx. These eigenvectors determine linear combinations of the n training set images to form the basis set of images,that best describe the variations in the training set images. Because Cx is real and symmetric, these vectors form an orthonormal set, and it follows that the elements along the main diagonal of Cy, are the eigen values of Cx. The main diagonal element in the ith row of Cy is the variance of vector element Yi. Because the rows of A are orthonormal, its inverse equals its transpose. Thus, we can recover the X’s by performing the inverse transformation

X=ATy+Mx (4)

A new query image is projected similarly onto the eigenspace and the coefficients are computed. The class that best describes the query image is determined by a similarity measure defined in terms of the Euclidean distance of the coefficients of query and each images in each class. The training set image whose coefficients are closest (in the Euclidean sense) to those of the query image is selected as the match image. If the minimum Euclidean distance exceeds a preset threshold, the query image is assigned to a new class.

Generalized Principal Component Analysis (GPCA) This scheme works directly with images in their native

state, as two-dimensional matrices, by projecting the images to a vector space that is the tensor product of two lower-dimensional vector spaces. GPCA is superior to PCA in terms of quality of the compressed images, query precision, and computational cost. The key difference between PCA and the generalized PCA (GPCA) method that we propose in this paper is in the representation of image data. While PCA uses a vectorized representation of the 2D image matrix, GPCA works with a representation that is closer to the 2D matrix representation (as illustrated schematically in Figure 2) and attempts to preserve spatial locality of the pixels. The matrix representation in GPCA leads to SVD computations on matrices with much smaller sizes. More specifically, GPCA involves the SVD computation on matrices with sizes r x r and c x c, which are much smaller than the matrices in PCA (where the dimension is n x (r x c). This reduces dramatically the time and space complexities of GPCA as compared to PCA. This is partly due to the fact that images are two-dimensional signals and there are spatial locality properties intrinsic to images that the representation used by GPCA seems to take advantage of.

B.1 GPCA Implementation In GPCA, the algorithm deals with data in its native matrix

representation and considers the projection onto a space, which is the tensor product of two vector spaces. More specifically, for given integers l1 and l2, GPCA computes the (l1, l2) - dimensional axis system ui x vj, for i = 1 …l1 and j = 1 …l2, where denotes the tensor product, such that the projection of the data points (subtracted by the mean) onto this axis system has the largest variance among all (l1, l2)-dimensional axes systems.

172

Fig 2: Schematic view of the key difference between GPCA and PCA. GPCA works on the original matrix representation of images directly, while PCA applies matrix-to-vector alignment first and works on the vectorized representation of images, which may lead to loss of spatial locality information. Formulation of GPCA: Let Ak, for k = 1,…….., n be the n images in the dataset and calculate mean using the equation (5) given below n M=1/n (∑ Ak) (5) k=1 Let, Aj= Ak –M for all j (6).

GPCA aims to compute two matrices L and R with orthonormal columns, such that the variance var (L, R) is maximum using equations (7) and (8). The main observation, which leads to an iterative algorithm for GPCA, is stated in the following theorem: n ML = ∑ Aj Ri Ri

T AjT (7)

j=1

n MR = ∑ Aj

T Li LiT Aj (8)

j=1 Theorem: Let L, R be the matrices maximizing the variance var (L, R). Then, _

For a given R, matrix L consists of the l1 eigenvectors of the matrix ML corresponding to the largest l1 eigen values.

For a given L, matrix R consists of the l2 eigenvectors of the matrix MR corresponding to the largest l2 eigen values.

Theorem provides us an iterative procedure for computing

L and R. More specifically, for a fixed L, we can compute R by computing the eigenvectors of the matrix MR. With the computed R, we can then update L by computing the eigenvectors of the matrix ML. The solution depends on the initial choice, L0, for L. Experiments show that choosing L0 = (Id, 0) T, where Id is the identity matrix, produces excellent results. We use this initial L0 in all the experiments. Given L and R, the projection of Aj onto the axis system by L and R can be computed by Di = LT Aj R. Algorithm:

Let A1……..An be the n images in a database.

Step 1: Calculate mean of all the n images using equation (5).

Step 2: Subtract the mean from image using equation (6).

Step 3: Set an identity matrix using L0= (Id, 0) T. Step 4: Form the matrix ML to obtain l1 eigen vectors using equation (7).

Step 5: Compute the d eigenvectors (Ri) of MR

corresponding to the largest d eigen values.

Step 6: Form the matrix MR to obtain l2 eigen vectors using equation (8). Step 7: Compute the d eigenvectors (Li) of ML corresponding to the largest d eigen values.

Step 8: Obtain the reduced representation using the equation, Dj = LTAj R (9) EXPERIMENT RESULTS

In this experiment, we applied PCA and GPCA on the 40

images of size 124x124 in the medical image dataset that contains brain, chest, breast and elbow images which is shown in figure.3. Both PCA and GPCA can be applied for medical image retrieval. The experimental comparison of PCA and GPCA is based on the assumption that they both use the same amount of storage. Hence it is important to understand how to choose the reduced dimension for PCA and GPCA for a specific storage requirement. We use p = 9 (where p corresponds to the principal components) in PCA (as shown in TABLE I) and set d = 4 (where d corresponds to the largest two eigen values) for GPCA (as shown in TABLE II) correspondingly.

Fig. 3. Medical image database

The reduced dimensions are chosen in a way that captures

essential features of the data with very little loss of information. PCA is popular because of its use of multidimensional representations for the compressed format.

173

TABLE I Features obtained for

PCA

0.80840.78870.7820-0.8165

-0.5033-0.5774-0.59430.4082

-0.3052-0.2113-0.18770.4082V3

0.11440.21130.23480

0.64300.57740.5599-0.7071

-0.7573-0.7887-0.79460.7071V2

0.57740.57740.57740.5774

0.57740.57740.57740.5774

0.57740.57740.57740.5774V12

0.7427-0.81650.33930

-0.66520.40820.4735-0.7071

-0.07750.4082-0.81280.7071V3

0.339300.7427-0.8165

0.4735-0.7071-0.66520.4082

-0.81280.7071-0.07750.4082V2

0.57740.57740.57740.5774

0.57740.57740.57740.5774

0.57740.57740.57740.5774V11

ElbowBreastChestBrainEigen VectorsImages

0.80840.78870.7820-0.8165

-0.5033-0.5774-0.59430.4082

-0.3052-0.2113-0.18770.4082V3

0.11440.21130.23480

0.64300.57740.5599-0.7071

-0.7573-0.7887-0.79460.7071V2

0.57740.57740.57740.5774

0.57740.57740.57740.5774

0.57740.57740.57740.5774V12

0.7427-0.81650.33930

-0.66520.40820.4735-0.7071

-0.07750.4082-0.81280.7071V3

0.339300.7427-0.8165

0.4735-0.7071-0.66520.4082

-0.81280.7071-0.07750.4082V2

0.57740.57740.57740.5774

0.57740.57740.57740.5774

0.57740.57740.57740.5774V11

ElbowBreastChestBrainEigen VectorsImages

GPCA compute the optimal feature vectors L and R such that original matrices are transformed to a reduced 2 x 2 matrices and in PCA, feature vectors are obtained as a 3x3 matrix which is listed in tables 1 and 2.

TABLE II Features obtained for GPCA

Images Matrix

Brain Db1 -3.3762 1.1651

-0.2207 -0.6612

Db2 4.6552 2.6163

-0.4667 0.7519

Db3 4.6552 -2.7397

2.6163 -1.6044

Db4 -1.7318 0.1744

0.7202 -0.4391

Db5 -1.6252 0.0462

-0.0010 0.1173

Images Matrix

Brain Db1 -3.3762 1.1651

-0.2207 -0.6612

Db2 4.6552 2.6163

-0.4667 0.7519

Db3 4.6552 -2.7397

2.6163 -1.6044

Db4 -1.7318 0.1744

0.7202 -0.4391

Db5 -1.6252 0.0462

-0.0010 0.1173

ImagesImages Matrix Matrix

Brain Brain Db1Db1 -3.3762-3.3762 1.16511.1651

-0.2207-0.2207 -0.6612-0.6612

Db2Db2 4.65524.6552 2.61632.6163

-0.4667-0.4667 0.75190.7519

Db3Db3 4.65524.6552 -2.7397-2.7397

2.61632.6163 -1.6044-1.6044

Db4Db4 -1.7318-1.7318 0.17440.1744

0.72020.7202 -0.4391-0.4391

Db5Db5 -1.6252-1.6252 0.04620.0462

-0.0010-0.0010 0.11730.1173

Therefore, GPCA has asymptotically minimum memory

requirements, and lower time complexity than PCA, which is desirable for large medical image databases. GPCA also uses transformation matrices that are much smaller than PCA. This significantly reduces the space to store the transformation matrices and reduces the computational time in computing the reduced representation for a query image. Experiments show superior performance of GPCA over PCA, in terms of quality of compressed images and query precision, when using the same amount of storage.

The feature vectors obtained through feature selection methods are fed to a Hopfield neural classifier for efficient medical image retrieval. Hopfield neural network is a type

of Recurrent Neural Network in which a physical path exists from output of a neuron to input of all neurons except for the corresponding input neuron. If PCA features are fed to Hopfield network, then 9 neurons are used in input layer since size of PCA feature vector is 1 9 and if GPCA features are used as classifier input then, 4 neurons are used in the input layer of Hopfield network since GPCA feature vector is of size 1 4. Energy is calculated using equation (10).

E= -0.5* S*W*ST

(5)

where, E is the energy of a particular pattern (S) W is the weight value

The test pattern energy is compared with the stored pattern energy and the images having energy close to the test pattern energy are retrieved from the database.

CONCLUSION

To overcome problems associated with high

dimensionality, such as high storage and retrieval times, a dimension reduction step is usually applied to the vectors to concentrate relevant information in a small number of dimensions. In this paper, two subspace analysis methods such as Principal Component Analysis (PCA) and Generalized Principal Component Analysis (GPCA) is presented and compared. PCA is a simple well known dimensionality reduction technique that applies matrix-vector alignment first and works on the vectorized representation of images, which may lead to loss of spatial locality information, while GPCA works on the original matrix representation of images directly. GPCA is found superior to PCA in which dimensionality is reduced to a 2x2 matrix, whereas in PCA eigen vectors are obtained as a 3x3 matrix. GPCA works directly with images in their native state, as two-dimensional matrices, by projecting the images to a vector space that is the tensor product of two lower-dimensional vector spaces.

REFERENCES

[1] U. Sinha, H. Kangarloo, Principal component analysis for content-based image retrieval, RadioGraphics 22 (5) (2002) 1271-1289. [2] J. Ye, R. Janardan, and Q. Li. GPCA: An efficient dimension reduction scheme for image compression and retrieval. In KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery And data mining, pages 354–363, New York, NY, USA, 2004. ACM Press. [3] Henning Muller, Nicolas Michoux, David Bandon, Antoine Geissbuhler, “A review of content-based image retrieval systems in medical applications - clinical benefits and future directions”, International Journal of Medical Informatics.,vol. 73, pp. 1 – 23, 2004. [4] Imola K. Fodor Center for Applied Scientific Computing,Lawrence Livermore National Laboratory, A survey of dimension reduction techniques. [5] Marco Loog1, Bram van Ginneken1, and Robert P.W.

174

Duin2 “Dimensionality Reduction by Canonical Contextual Correlation Projections,” T. Pajdla and J. [6] P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, Z. Protopapas, Fast and effective retrieval of medical tumor shapes, IEEE Transactions on Knowledge and Data Engineering 10 (6) (1998) 889-904. [7] S. Baeg, N. Kehtarnavaz, Classification of breast mass abnormalities using denseness and architectural distortion, Electronic Letters on Computer Vision and Image Analysis1 (1) (2002) 1-20. [8] F. Schnorrenberg, C. S. Pattichis, C. N. Schizas, K. Kyriacou, Contentbased retrieval of breast cancer biopsy slides, Technology and Health Care 8 (2000) 291297. [9] Two-dimensional nearest neighbor discriminant analysis Xipeng Qiuf, Lide Wu 0925-2312/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2007.02.001

[10] B. Bai, P. Kantor, N. Cornea, and D. Silver. Toward content-based indexing and retrieval of functional brain images. In Proceedings of the (RIAO07), 2007. [11] D. Comaniciu, P. Meer, D. Foran, A. Medl, Bimodal system for interactive indexing and retrieval of pathology images, in: Proceedings of the Fourth IEEE Workshop on Applications of Computer Vision (WACV'98), Princeton, NJ, USA, 1998, pp. 7681. [12] M. R. Ogiela, R. Tadeusiewicz, Semantic-oriented syntactic algorithms for content recognition and understanding of images in medical databases, in: Proceedings of the second International Conference on Multimedia and Exposition (ICME'2001), IEEE Computer Society, IEEE Computer Society, Tokyo, Japan, 2001, pp. 621-624. [13]www.e-radiography.net/ibase5/index.htm-xray2000 Image base v6 July 2007.

175

Efficient Whirlpool Hash Function

D.S.Shylu J.Piriyadharshini Sr.Lecturer, ECE Dept, II ME(Applied Electronics) Karunya University, Karunya University Coimbatore- 641114. Coimbatore- 641114. mail id:[email protected] mail id: [email protected] Contact No: 9443496082 Contact No: 9842107110

Abstract —Recent breakthroughs in cryptanalysis of standard hash functions like SHA-1 and MD5 raise the need for alternatives. The latest cryptographical applications demand both high speed and high security. In this paper, an architecture and VLSI implementation of the newest powerful standard in the hash families, Whirlpool, is presented. It reduces the required hardware resources and achieves high speed performance. The architecture permits a wide variety of implementation tradeoffs. The implementation is examined and compared in the security level and in the performance by using hardware terms. This is the first Whirlpool implementation allowing fast execution, and effective substitution of any previous hash families’ implementations such as MD5, RIPEMD-160, SHA-1, SHA-2 etc, in any cryptography application1. I. INTRODUCTION A hash function is a function that maps an input of arbitrary length into a fixed number of output bits, the hash value. Hash functions are used as building blocks in various cryptographic applications. The most important uses are in the protection of information authentication and as a tool for digital signature schemes. In recent years the demands for effective and secure communications in both wire and wireless networks is especially noted in the consumer electronics area. In modern consumer electronics, security applications play a very important role. The interest in financial and other electronic transactions is grown; so the security applications can provide an important way forconsumers and businesses to decide which electronic communications they can trust. The most known hash function is the Secure Hash Algorithm-1 (SHA-1).The security parameter of SHA-1 was chosen in such a way to guarantee the similar level of security, in the range of 280 operations, as required by the best currently known attacks. But, the security level of SHA-1 does not match the security guaranteed by the new announced AES Encryption standard, which is specified 128-, 192-, and 256-bit keys. Many attempts have been taken place in order to put forward new hash functions and match the security level with the new encryption standard.

Institute of Standards and Technology (NIST) announced the updated Federal Information Processing Standard (FIPS 180-2), which introduced three new hash functions referred to as SHA-2 (256, 384, 512). In addition, the New European Schemes for Signatures, Integrity, and Encryption (NESSIE) project, was responsible to introduce a hash function with similar security level. In February 2003, it was announced that the hash function included in the NESSIE portfolio is Whirlpool. All the above-mentioned hash functions are adopted by the International Organization for Standardization (ISO/IEC) 10118-3 standard. Whirlpool hash function is byte-oriented and consists of the iterative application of a compression function. This is based on an underlying dedicated 512-bit block cipher that uses a 512-bit key and runs in 10 rounds in order to produce a hash value of 512 bits. In this paper, an architecture and VLSI implementation of the new hash function, Whirlpool, is proposed. It reduces the required hardware resources and achieves high-speed performance. The proposed implementation is examined and compared, in the offered security level and in the performance by using hardware terms. In addition, due to no others Whirlpool implementations existences, comparisons with other hash families’ implementations are provided. From the comparison results it is proven that the proposed implementation performs better and composes an effective substitution of any previous hash families’ such as MD5, RIPEMD-160, SHA-1, SHA-2 etc, in almost all the cases.

II. WHIRLPOOL HASH FUNCTION Whirlpool is a one-way, collision resistant 512-bit hash function operating on messages less than 2^256 bits in length. It consists of the iterated application of a compression function, based on an underlying dedicated 512-bit block cipher that uses a 512-bit key. The Whirlpool is based on dedicated block cipher, W, which operates on a 512-bit hash state using a chained key state, both derived from the input data. The round function and the key schedule, of the W, are designed according to the Wide Trail strategy. In the following, the round function of the block cipher, W, is defined, and then the complete hash function is specified. The block diagram of the W block cipher basic round is shown in Fig. 1. from three algebraic functions. These functions are the non-linear layer , the

176

cyclical permutation ρ, and the linear diffusion layer . So , the round function is the composite mapping þ [k], parameterized by the key matrix k, and given by: ρ [k][k] (1) Symbol “ o ” d enotes the sequential (associative) operation of each algebraic function where the right function is executed first. The key addition [k], consists of the bitwise addition (exor) of a key matrix k such as : [] (a)=b bij = aij xorkij,0i, j7 (2)

This mapping is also used to introduce round constants in the key schedule. The input data (hash state) is internally viewed as a 8x8 matrix over GF(28). Therefore, 512-bit data string must be mapped to and from this matrix format. This is done by function µ such as: (a) = b bij = a8i+j, 0i, j7 (3) The first transformation of the hash state is through the nonlinear layer γ, which consists of the parallel application of a non-linear substitution S-Box to all bytes of the argument individually. After, the hash state is passed through the permutation π that cyclical shifts each column of its argument independently, so that column j is shifted downwards by j positions. The final transformation is the linear diffusion layer θ, which the hash state is multiplied with a generator matrix. The effect of θ is the mix of the bytes in each state row.So, the dedicated 512-bit block cipher W[K], parameterizedby the 512-bit cipher key K, is defined as: W[k] = (O1

r=R þ[kr])[k0] (4) where, the round keys K0,…,KR are derived from K by the key schedule. The default number of rounds is R=10. The key schedule expands the 512-bit cipher key K onto a sequence of round keys K0,…,KR as: K0 = K Kr = þ[cr](Kr-1),r>0 (5) The round constant for the r-th round, r>0, is a matrix cr defined by substitution box (S-Box) as: C0j

rS[8(r-1) + j],0j7, (6) Cij

r 0 1i7, 0j7, So, the Whirlpool iterates the Miyaguachi-Preneel hashing scheme [14] over the t padded blocks mi, 1δ i δ t , using the dedicated 512-bit block cipher W: ni = (mi), H0 = W[Hi-1](ni)xor Hi-1 xor ni, 1it (7) As (4) and (5) shows the internal block cipher W, comprises of a data randomizing part and a key schedule part. These parts consist of the same round function.Before being subjected to the hashing operation, a message M of bit length L<2256 is padded with a 1-bit, then as few 0-bits

as necessary to obtain a bit string whose length is an odd multiple of 256, and finally with the 256-bit right-justified binary representation of L, resulting in the padded message m,partitioned in t blocks m1, m2, ... ,mt.

III. HARDWARE ARCHITECTURE AND VLSI IMPLEMENTATION

The architecture that performs the Whirlpool hash function is shown in Fig. 2.The Padder pads the input data and converts them to n-bit padded message. In the proposed architecture an interface with 256-bit input for Message is considered. The input n, specifies the total length of the message. The padded message is partitioned into a sequence of t 512-bit blocks m1, m2, … , mt. This sequence is then used in order to generate a new sequence of 512-bit string, H1, H2, … , Ht in the following way. mi is processed with Hi-1 as key, and the resulting string is XORed with mi in order to produce the Hi. H0 is a string of 512 0-bits and Ht is the hash value. The block cipher W, is mainly consists of the round function ρ. The implementation of the round function ρ is illustrated in Fig.2.

177

The Padder pads the input data and converts them to n-bit padded message. In the proposed architecture an interface with 256-bit input for Message is considered. The input n, specifies the total length of the message. The padded message is partitioned into a sequence of t 512-bit blocks m1, m2, … , mt. This sequence is then used in order to generate a new sequence of 512-bit string, H1, H2, … , Ht in the following way. mi is processed with Hi-1 as key, and the resulting string is XORed with mi in order to produce the Hi. H0 is a string of 512 0-bits and Ht is the hash value. The block cipher W, is mainly consists of the round function ρ. The implementation of the round function ρ is illustrated in fig 3. The non-linear layer γ, is composed of 64 substitution tables(S-Boxes). The internal structure of the S-Box is shown in Fig.3. It consists of five 4-bit mini boxes E, E-1, and R. These mini boxes can be implemented either by using Look-Up-Tables (LUTs) or Boolean expressions. ALGORITHM FOR SINGLE S-BOX: • The input given is of length 512 bits, are divided into

8 bits, so that 64 s-box with 8 bits as input in each of the s-box

• Consider the mini box as e and ei of the first layer which receives 8 bits as input.

• The received 8 bits are split in to 4 bits in each of the mini boxes.

• The XOR operation is performed in such a way that the LSB of e is XOR with the LSB of ei.

• Similarly the remaining bits are also XORed. • The result of e XOR ei is given as the input to the

mini box r • Now the direct input of e is XOR with the bits holding

in r .

• Similarly the direct input of ei is XOR with the bits holding in r.

• Finally the result of e XOR r is fed to the final mini box eo and the result of ei XOR r is fed to eoi.

ALGORITHM FOR FINAL S-BOX: • The input given to the s-box(sin) and the output obtain

from the s-box(sout) is of length 512 bits. • This input of 512 bits is being divided in to two 256

bits. • Then the first 256(0 to 255) bits flow through the

signal s and the remaining 256(256 to 511) bits flow through the signal si.

• Now the fist 256 bits are further divided into 4 bits, each of which is fed to the component of s-box.(i.e. the mini box E)

• Then the remaining 256 bits are further divided into 4 bits, each of which is fed to the component of s-box. .

• The output obtain from the component of s-box is eo.(i.e. the mini box E) and eoi.(i.e. the mini box E-1).

• The output obtained is an XOR ed output and not the original output.

• And this will be the input to the next stage of the project.

Next, the cyclical permutation π, is implemented by using combinational shifters. These shifters are cyclically shift (in downwards) each matrix column by a fixed number (equal to j), in one clock cycle.The linear diffusion layer θ, is a matrix multiplication between the hash state and a generator matrix. In [5] a pseudocode is provided in order to implement the matrix multiplication. However, in this paper an alternative way is proposed which is suitable for hardware implementation. The transformation expressions

178

of the diffusion layer are given below (equation (8)). Bytes bi0, bi1, … , bi7 represent the eight bytes of the i row of the output of the layer θ hash state. Table X implements the multiplication by the polynomial g(x)=x modulo (x8+x4+x3+x2+1) in GF(28) (i.e. X[u] α x*u ,where u denote the input of the table). bi0=ai0xor ai1xor ai3xor ai5xorai7xorX[ai2]xorX2[ai3 xor ai6] xor X3[ai1xorai4]. bi1=ai0xor ai1xor ai2xor ai4xorai6xorX[ai3]xorX2[ai4 xor ai7] xor X3[ai2xorai5]. bi2=ai1xor ai2xor ai3xor ai5xorai7xorX[ai4]xorX2[ai5 xor ai0] xor X3[ai3xorai6]. bi3=ai0xor ai2xor ai3xor ai4xorai6xorX[ai5]xorX2[ai6 xor ai1] xor X3[a41xorai7]. bi4=ai1xor ai3xor ai4xor ai5xorai7xorX[ai6]xorX2[ai7 xor ai2] xor X3[ai5xorai0]. bi5=ai0xor ai2xor ai4xor ai5xorai6xorX[ai7]xorX2[ai0 xor ai3] xor X3[ai6xorai1]. bi6=ai1xor ai3xor ai5xor ai6xorai7xorX[ai0]xorX2[ai1 xor ai4] xor X3[ai7xorai2]. bi7=ai0xor ai2xor ai4xor ai6xorai7xorX[ai1]xorX2[ai2 xor ai5] xor X3[ai0xorai3]. In Fig. 3, the implementation of the output byte bi0 is depicted in details. The other bytes are implemented in a similar way. The key addition (σ[k]) consists of eight 2-input XOR gates for any byte of the hash state. Every bit of the round key is XORed with the appropriate bit of the hash state.

IV. RESULTS

Fig. 4 .Waveform of single s- box

An efficient architecture and VLSI implementations for the new hash function, named Whirlpool are presented in this paper. Since four implementations have been introduced, each specific application can be choose the appropriate speed-area trade-off implementation. A pipelined architecture is one option in order to improve the time

performance of the Whirlpool implementation. It is possible to insert a negative-edge pipeline register, in round function ρ, as the Fig. 3, shows (dash line, after the permutation ). This register can be inserted, roughly in the middle of the round function. This is an efficient way in order to reduce the critical path delay, with a small area (512-bit register) penalty. So, the clock frequency can be roughly doubled and the time performance will increase without any algorithm execution latency increase. Another way in order to improve the implementation performance is the usage of more pipeline stages. It is possible to insert 3 pipeline stages for the implementation of the round function ρ. The first positive-edge pipeline register is inserted in the same position as in the previous paragraph described.

Fig. 5. Waveform of final s- box

Fig. 5. Waveform of cyclical permutation π The other two pipeline registers are inserted before and after the tables X, X2, and X3 (dash lines in Fig. 3).

179

Someone could claim that this pipeline technique is inefficient, for the Whirlpool implementation, because in order to process the mi block, the result (Hi-1) from the previous processed block (mi-1)is needed as a cipher key. Afterward, this feature prohibits the possibility to process simultaneously more than one block.But, in applications with limited processor strength, like smart cards, the above pipeline technique is essential in order to reduce the critical path delay and the efficiently execution of the Whirlpool.The internal structure of Whirlpool is very different from the structure of the SHA-2 functions. So, it is unlikely an attack against one will hold automatically for the other. This makes the Whirlpool a very good choice for consumer electronics applications. REFERENCES

[1] SHA-1 Standard, National Institute of Standards and Technology (NIST), Secure Hash Standard, FIPS PUB 180-1, on line available at www.itl.nist.gov/fipspubs/fip180-1.htm [2] “Advanced encryption standard”, on line available at http://csrc.nist.gov [3] SHA-2 Standard, National Institute of Standards and Technology(NIST), Secure Hash Standard, FIPS PUB 180-2, on line available at http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf [4] “NESSIE. New European scheme for signatures, integrity, and encryption”, http://www.cosic.esat.kuleuven.ac.be/nessie [5] P. S. L. M. Barreto and V. Rijmen, “The Whirlpool hashing function”.Primitive submitted to NESSIE, September 2000, revised on May 2003,http://planeta.terra.com.br/informatica/paulobarreto/WhirlpoolPage.html [6]International Organization for Standardization, “ISO/IEC 10118-3: Information technology – Security techniques – Hash functions – Part 3:Dedicated hash-functions”. 2003.

[7] Janaka Deepakumara, Howard M. Heys and R. Venkatesam, “FPGA Implementation of MD5 hash algorithm”, Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering(CCECE 2001), Toronto, Ontario, May 2001. [8] N. Sklavos, P. Kitsos, K. Papadomanolakis and O. Koufopavlou,“Random number generator architecture and VLSI implementation”,Proceedings of IEEE International Symposium on Circuits and Systems(ISCAS 2002), USA, 2002. [9] Yong kyu Kang, Dae Won Kim, Taek Won Kwon and Jun Rim Choi,“An efficient implementation of hash function processor for IPSEC”,Proceedings of Third IEEE Asia-Pacific Conference on ASICs, Taipei,Taiwan, August 6-8, 2002. [10] Sandra Dominikus, “A hardware implementation of MD4-Familyalgorithms”, Proceedings of IEEE International Conference on Electronics Circuits and Systems (ICECS 2002), Croatia, September 2002. [11] Tim Grembowski, Roar Lien, Kris Gaj, Nghi Nguyen, Peter Bellows, Jaroslav Flidr, Tom Lehman, and Brian Schott, “Comparative analysisof the hardware implementations of hash functions SHA-1 and SHA-512”, Proceedings of fifth International Conference on Information Security (ISC 2002), LNCS, Vol. 2433, Springer-Verlag, Sao Paulo,Brazil, September 30-October 2, 2002. [12] Diez J. M., Bojanic S., Stanimirovic Lj., Carreras C., Nieto-Taladriz O.,“Hash algorithm for cryptographic protocols: FPGA implementation”, Proceedings of 10th Telecommunications forum (TELFOR 2002), November 26-28, Belgrade, Yugoslavia, 2002. [13] N. Sklavos and O. Koufopavlou, “On the hardware implementation of the SHA-2 (256, 384, 512) hash functions”, Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS 2003), May 25-28, Bangkok, Thailand, 2003. [14] A. J. Menezes, P. C. van Oorschot, S. A. Vastone, Handbook of applied cryptography, CRC Press, 1997. cryptanalysis, Ph. D. thesis, KU Leuven, March 1995.

180

2-D FRACTAL ARRAY DESIGN FOR 4-D ULTRASOUND IMAGING

Ms. Alice John, Mrs.C.Kezi Selva Vijila M.E. Applied electronics, HOD-Asst. Professor

Dept. of Electronics and Communication Engineering Karunya University, Coimbatore

Abstract- One of the most promising techniques for limiting complexity for real time 3-D ultra sound systems is to use sparse 2-D layouts. For a given number of channels, optimization of performance is desirable to ensure high quality volume images. To find optimal layouts, several approaches have been followed with varying success. The most promising designs proposed are Vernier arrays, but also these suffer from high peaks in the side lobe region compared with a dense array. In this work, we propose new method based on the principal of suppression of grating lobes. The proposed method extends the concept of fractal layout. Our design has simplicity in construction, flexibility in the number of active elements and the possibility of suppression of grating lobes.

Index Terms- 4-D Ultrasound imaging, sparse 2-D array, fractal layout, sierpinski car pet layout.

1. INTRODUCTION The new medical image modality, volumetric

imaging, can be used for several applications including diagnostics, research and non-invasive surgery. Existing 3-D ultrasound systems are based on mechanically moving 1-D arrays for data collections and preprocessing of data to achieve 3-D images. The main aim is to minimize the number of channels without compromising image quality and to suppress the side lobes. New generations of ultrasound systems will have the possibility to collect and visualize data in near real time. To develop the full potential of such a system, an ultrasound probe with a 2-D transducer array is needed.

Current systems use linear arrays with more than 100 elements. A 2-D transducer array will contain between 1500 and 10,000 elements. Such arrays represent a technological challenge because of the high channel count [1]. To overcome this challenge, undersampling the 2-D array by only connecting some of the all possible elements [2] is a suitable solution. For a given set of constraints, the problem is to choose those elements that give the most appropriate beam pattern or image. The analysis of such sparse array beam patterns has a long history. A short review of some of these works can be found in [3].

\

Several methods for finding sparse array layouts for 4-D ultrasound imaging have been reported. Random approaches have been suggested by Turnbull et al. [4], [5] and this work has been followed by Duke University [6]-[7]. Weber et al. have suggested using genetic algorithms. Similar layouts have been found out by Holm et al. using linear programming and by Trucco using simulated annealing.

Sparse arrays can be divided into 3 categories, random, fractal, periodic. One of the promising category is sparse periodic arrays [8]. These are based on the principal of different transmit and receive layouts, where the grating lobes in the transmit array response are suppressed by receive array response and vice versa. Periodic arrays utilize partial cancellation of transmit and receive grating lobes. Sparse periodic arrays have a few disadvantages; one is the use of overlapping elements, another is the strict geometry which fixes the number of elements. An element in a 2-D array will occupy a small area compared to an element in a 1-D. The sparse periodic array is having high resolution but there is frequent occurrence of side lobes.

In the sparse random arrays, one element is chosen at random according to a chosen distribution function. Due to randomness, the layouts are very easy to find. The sparse random arrays are having low resolution but the suppression of side lobes is maximum. By exploiting the properties of sparse random arrays and sparse periodic arrays, we go for fractal arrays. In Fractal arrays, we can obtain high resolution with low side band level by using the advantages of both periodic and random arrays.

To simplify future integration of electronics into the probe, the sparse transmit and receive layouts should be chosen to be non-overlapping. This means that some elements should be dedicated to transmit while others should be used to receive. To increase system performance, future 2-D arrays should possibly include pre-amplifiers directly connected to the receive elements.

The paper is organized in the following manner. Section II describes fractal array design starting with sierpinsky fractal, carpet fractal and then pulse echo response. Section III describes the simulation and performance of different designs by adjusting the kerf value. In section IV, we summarize the paper.

181

II. FRACTAL ARRAY LAYOUTS

A fractal is generally a rough or fragmented geometric shape that can be subdivided into parts, each of which is (at least approximately) a reduced-size copy of the whole, a property called self-similarity.The Fractal component model has the following important features:

• Recursivity : components can be nested in composite components

• Reflectivity: components have full introspection and intercession capabilities.

• Component sharing: a given component instance can be included (or shared) by more than one component.

• Binding components: a single abstraction for components connections that is called bindings. Bindings can embed any communication semantics from synchronous method calls to remote procedure calls

• Execution model independence: no execution model is imposed. In that, components can be run within other execution models than the classical thread-based model such as event-based models and so on.

• Open: extra-functional services associated to a component can be customized through the notion of a control membrane.

A. Sierpinski Fractal

In the sierpinski fractal we have considered mainly two types

• Sierpinski triangle • Sierpinski carpet

B. Sierpniski Triangle

The Sierpinski Triangle also called Sierpinski Gasket and Sierpinski Sieve.

• Start with a single triangle. This is the only triangle in this direction; all the others will be drawn upside down.

• Inside the first triangle, we have drawn a smaller upside down triangle. It's corners should be exactly in the centers of the sides of the large triangle

C. Sierpinski Carpet

In this paper we are mainly considering carpet layout because we are considering 2-D array.

• Transmitter array: transmit array is drawn using a matrix M consisting of both ones and zeros. These arrays have been constructed by considering a large array of element surrounded by a small matrix. In carpet fractal array first of all we have drawn a square at the right middle and this small square will occupy 1/3rd of the original big array. Surrounding the above built square we have constructed small squares.

• Receiver array: in the sparse 2-D array layout to avoid overlapping we are selecting different receiver and transmitter arrays. In our paper we have taken those elements for receiver array which will never cause an overlapping.

D. Pulse-Echo Response

The layout should have optimal pulse-echo performance, i.e. the pulse-echo radiation pattern should have as low sidelobe level as possible for a specified mainlobe width for all angles and depths of interest. To compute the pulse-echo response for a given transmit and receive layout is time consuming. A simplification commonly used is to evaluate the radiation properties in continuous wave mode in the far field. An optimal set of layouts for continuous waves does not necessarily give optimal pulse-echo responses. To ensure reasonable pulse-echo performance, additional criteria which ensure a uniform distribution of elements could be introduced. This will limit the interference in the sidelobe region between pulses transmitted from different elements and reduce the sidelobe level.

Fig. 1. Pulse-echo response of a sierpinsky carpet layout

182

III. RESULTS AND DISCUSSION

Fractal layout exploits the advantages of both the periodic and random arrays. Our main aim is to suppress the sidelobes and to narrow down the mainlobe. Firstly we have created transmit and receive array layouts. Both the layouts have been constructed in such a way they both won’t overlap each other. Transmit array is designed using a matrix M. Iterations up to 3, were taken to construct the transmit array. The intensity distributions were taken to find out the spreading of the sidelobe and the mainlobe.

In our paper we have taken into consideration different specifications such as speed of the sound wave i.e. 1540 m/s, initial frequency, sampling frequency as 100.10^6 HZ, width and height of the array, kerf is also considered that is the height between the elements in an array.

A. case I: kerf = 0

We have simulated the transmitter and receiver layout in this we can see since kerf value i.e. the distance between the elements are given as zero there is no spacing between the elements. From the pulse-echo response we can come to the conclusion that in this case the mainlobe is not sharp but the sidelobe level is highly suppressed. Fig. 2(a-b) shows the transmitter and receiver layout. Fig. 2© shows the pulse-echo response and Fig. 2(d) shows the intensity distribution from which we can see that the side lode level is reduced.

B. case II: kerf = lamda/2

In the second case the kerf value is taken as lamda/2, so we can see a lamda/2 spacing between the transmitter and receiver array. Fig 3(a-b) shows the layouts. Fig. 3© shows the pulse-echo response in which we can see that the mainlobe is now sharp but the sidelobes are not highly suppressed. Fig. 3(d) shows the intensity distribution where the sidelobe level is high compared to that of case I.

C. case III: kerf = lamda/4

In the third case kerf value is taken as lamda/4 Fig. 4(a-b) shows the array layouts. Fig. 4© shows pulse-echo response in which the main lobe is sharp but the sidelobe level is high. From the intensity distribution also we can see that the sidelobe distribution is high compared to case II.

D. case IV: kerf = lamda

In the last case kerf value is taken as lamda and because of this we can see a spacing of lamda between the elements in the array. Fig. 5(a-b) shows the transmitter and receiver layout. Fig. 5© shows the pulse-echo response here the mainlobe very sharp but the sidelobe level started spreading towards both sides. Fig. 5(d) shows its intensity distribution. The intensity distribution shows the spreading of the sidelobe clearly. The sidelobe level in this case is high compared to all other cases.


(c ) Pulse-Echo Response

(b) Receiver array


183

Fig. 2. (a)-(b) show array layout and (c)-(d) show pulse response

for kerf=0


(b) Receiver array



Fig. 3. (a)-(b) show array layout and (c)-(d) show pulse echo

response for kerf=lamda/2


(b) Receiver array



Fig. 4. (a)-(b) show array layout and (c)-(d) show pulse echo

response for kerf=lamda/4

184


(b) Receiver array


(d ) Intensity Distribution

Fig. 5. (a)-(b) show array layout and (c)-(d) show pulse echo response for kerf=lamda

IV. CONCLUSION

To construct a 2-D array for 4-D ultrasound imaging we need to meet many constraints in which an important one is regarding the mainlobe and sidelobe level. To execute this we are going for pulse-echo response. We have shown it is possible to suppress the unwanted sidelobe levels by adjusting different parameters of the array layout. We have also shown the changes in the intensity level while adjusting the spacing between array elements. As a future we will calculate the mainlobe BW, ISLR and the sidelobe peak value to take the correct fractal, the above shown parameters will affect the image quality.

REFERENCES

[1]B. A. J. Angelsen, H. Torp, S. Holm, K.

Kristoffersen, and T. A. Whittingham, “Which transducer array is best?,” Eur. J. Ultrasound, vol. 2, no. 2, pp. 151-164, 1995.

[2]S. Holm, “Medical ultrasound transducers and beamforming,” in Proc. Int. Cong. Acoust., pp. 339-342, Jun. 1995.

[3]R. M. Leahy and B. D. Jeffs, “On the design of maximally sparse beamforming arrays,” IEEE Trans. Antennas Propagat.,vol. AP-39, pp. 1178-1187, Aug. 1991.

[4]D. H. Turnbull and F. S. Foster, “Beam steering with pulsed two-dimensional transducer arrays,” IEEE Trans. Ultrason.,Ferroelect., Freq. Contr., vol. 38, no. 4, pp. 320-333, 1991.

185

PC Screen Compression for Real Time Remote Desktop Access

Shanthini Pandiaraj, Assistant Professor, Department of Electronics & Communication Engineering

Karunya University, Coimbatore and Jagannath.D.J, Final Year Masters degree in Engineering, Karunya University, Coimbatore.

[email protected], [email protected]

Abstract- We present a personal computer screen image compression algorithm for real-time applications as remote desktop access by computer screen image transmission. We call the computer screen image as a compound image, because one 800 X 600 true color screen image has a size of approximately 1.54 MB with pictures and text. We call our algorithm as group extraction and coding (GEC). Real-time image transmission requires that the compression algorithm should not only achieve high compression ratio, but also have excellent visual quality and low complexity. GEC is used to first segment a compound image into pictorial pixels and text/graphics pixels, and then compresses the text/graphics pixels with a lossless coding algorithm and the pictorial pixels with JPEG, respectively. The segmentation of the compound screen image, segments the blocks into picture and text/graphics blocks by thresholding the number of colors contained in each block, then extracts shape primitives of text/graphics from picture blocks. Shape primitives are also extracted from text/graphics blocks. All shape primitives from both classes are compressed by using a wavelet based SPIHT lossless coding algorithm. Experimental results show that the GEC has very low complexity and provides visually excellent lossless quality with very good compression ratios. Index Terms— wavelet based SPIHT coding, compound image segmentation, shape primitive extraction, Compound image compression.

I.INTRODUCTION

Image compression is minimizing the size in bytes of a graphic file without degrading the quality of the image to an unacceptable level. The reduction in file size allows more images to be stored in a given amount of disk or memory space. It also reduces the time required for images to be sent over the internet or downloaded from web pages. As digital imagery becomes more commonplace and of higher quality, there is the need to manipulate more and more data. Thus, image compression must not only reduce the necessary storage and bandwidth requirements, but also allow extraction for editing, processing, and targeting particular devices and applications.

For real-time computer screen image transmission, the compression algorithm should not only achieve high compression ratios, but also have low complexity and visually lossless quality. Computer screen images are mixed with text, graphics, and natural pictures. Low complexity is very important for real-time compression, especially on smart displays and wireless projectors. Uncompressed graphics, audio and video data require considerable storage capacity and transmission bandwidth. Despite rapid progress in mass storages, processor speeds and digital system performance demand for data storage capacity and data transmission bandwidth continues to demand the capabilities of available technologies. To facilitate the bandwidth requirements, it is necessary to employ a good algorithm for compression technique, which creates smaller files of lower transmission requirements allowing for easier storage and transmission. Its goal is to store an image in a more compact form, i.e. a representation that requires fewer bits than the original image. This thesis focuses on the compression of a compound computer screen image and transmitting it. As the number of connected computers keeps growing, there has been a crucial need for real-time PC screen image transmission technologies. The need for data compression algorithms are the most important constituent for these real-time applications, since a huge amount of image data is to be transmitted in real time. One 800 X 600 true color PC screen image has a size of approximately 1.54 MB; produce more than 100-MB data. it is highly impossible to transmit such a large volume of data over the bandwidth-limited networks in real time without data compression algorithms. Even though the network bandwidth keeps widening, compound image compression algorithms can achieve more efficient data transmission, there exist two ways to reduce the spatial and temporal redundancy in the screen image sequence. The first approach is without any prior knowledge of the images, to use image compression algorithms. The second approach is to use some prior knowledge provided by the operating system, such as page layouts and detailed drawing operations and to use high-level graphics languages. Obviously, if we can obtain the prior knowledge easily, then text and graphics can be efficiently represented by original drawing operations, and the pictorial data only need to be compressed. If the picture to be displayed is in a compressed form, its bit stream can be directly transmitted. Thus, if the prior knowledge can be easily obtained, the process of PC

186

screen image compression can be perfectly done by drawing text strings, graphics, and encoding and decoding pictures with normal compression algorithms. However, there are two problems involved while using the second approach. One is the problem of difficulty to obtain the prior knowledge from existing operating systems. Until recent days, there is no operating system that exposes the information about its page layout and detailed drawing operations. The other is the problem of difficulty to apply the prior knowledge to different client machines with different types of fonts and GUIs existing on different types of platforms, there exists very low confidence that the reconstructed PC screen image on the client machine resembles the original screen image on the server machine. In contrast, the first type of approach based on PC screen image compression is more reliable because of its independency to different platforms. It is also less expensive because of its low complexity and simplicity. We propose a hybrid algorithm which combines both types of approaches to achieve a relatively better performance. This paper focuses on a personal computer screen image compression for real-time remote desk top access. The problem is how to obtain and exploit the prior knowledge to facilitate compression is out of bounds of the scope of this paper. This paper is organized as follows. Part 11 and 111, presents the introduction about the presented work, objective, and the need for compression of compound images with some basic concepts of computer generated images. Part 1V, gives an idea to the introduction of the GEC algorithm that is being implemented. Part V, the conclusion.

II. COMPOUND IMAGE COMPRESSION One 800 x 600 true color screen image has a size of approximately 1.54 MB and produce more than 100-MB data as shown in fig.1. It is highly impossible to transmit such a large volume of data over the bandwidth-limited networks in real time without data compression algorithms. For real-time PC screen image transmission, the compression algorithm should not only achieve high compression ratios, but also must have low complexity and visually lossless image quality. On the other side, poor reconstructed image quality reduces the readability of the text and results in unfavorable user experience with loss in data.

Fig.1. Compound screen image

For the real-time compression of computer screen images, scanned image compression algorithms cannot be directly applied, due to following differences between electronically scanned images and computer generated screen images. 1. Real-time compression algorithms must have very low complexity, whereas electronically scanned image compression does not posses such a condition. In this paper, we propose a high compression ratio, low complexity and high quality compression algorithm—group extraction and coding (GEC). GEC segments text/graphics from pictures, and provides a lossless coding method. 2. Scanned images are captured electronically by an imaging procedure, but PC screen images are purely synthetic images. Image compression algorithms, such as JPEG or JPEG-2000, can still be used for scanned images, and their performance can be improved by employing different qualities for text/graphics and pictures. Many scanned image compression algorithms employ JPEG for background and foreground layers, and employ JBIG2 for mask layers. In DCT or wavelet transform, ringing artifacts caused are not clearly visible around text/graphics edges, because these edges have been blurred in the process. For PC screen images; these ringing artifacts are easily noticeable due to the sharp edges of text/graphics. 3. Electronically scanned images have higher spatial resolution than PC screen images. The minimum acceptable quality for electronically scanned images is 300 dpi, but for screen images, it is less than 100 dpi. These algorithms work well for scanned images, but cause severe artifacts for computer screen images. Any minute alteration, such as “i” dots and thin lines, can make the PC screen image barely readable. 4. Electronically scanned images invariably posses some amount of noise, but PC screen images are free of noise. Therefore, for PC screen images, any noise introduced in compression is clearly noticeable in text/graphics (data) regions.

187

I1I. SERVER - CLIENT COMMUNICATION

Suitable software should be implemented in the computers, the one that transmits the desktop image and the one that receives the desktop image (double sided software). We call them as the Server and Client. Server is the computer that transmits its desk top that can be accessed by the other computer. The Client is the one that receives that image and proceeds in accessing it. The software should be implemented in the server and client such that, the client receives the server’s image, compresses it and transmits the encoded data to the server. We are using a visual basic based IP communication technique for this purpose.

Fig.2. Block diagram of server and client

IV. GEC --- ALGORITHM

GEC consists of two stages: segmentation and coding. The algorithm shown in fig.4, first segments 16X16 non-overlapping blocks of pixels into text/graphics block, as shown in fig.5 and picture block as shown in fig.3, then compresses the text/graphics with a lossless coding algorithm and pictures with JPEG, respectively. Finally, the lossless coded data and the JPEG picture data are put together into one bit-stream to obtain the reconstructed image. There are quite a number of reasons for choosing a 16X16 block size. In a 16X16 block, a pixel location (a, b) can be represented by 4-bit ‘a’ and 4-bit ‘b’, totally just one byte. Similarly, for a rectangle the width and the height in such a block can be represented by 4-bit ‘w’ and 4-bit ‘h’. This block size achieves a reasonable tradeoff for PC screen images. Moreover, it is easy for JPEG to compress such a block.

Fig.3. Picture segmentation

GEC segments the server compound image into two

classes of pixels: text/ graphics block and pictures block. There are normally four types of blocks: smooth background blocks (one color), text blocks (two color), graphics blocks (four color), and picture blocks (more than four colors). In fact, the first three types can be grouped into a larger text/graphics class, which greatly simplifies the segmentation. The combined text/graphics class can be coded by a lossless method. Shape primitives are those elementary building units that compose text/graphics in a compound image, such as dots, lines, curves, triangles, rectangles, and others. Four different types of shape primitives are used in GEC: 1. isolated pixels, 2. horizontal lines 3. Vertical lines, 4. Rectangles .A shape primitive has the same interior color. Two shape primitives can have the same shape but different colors. A shape primitive can be represented by a color tag and its position information, i.e., (a ,b ) is for an isolated pixel, (a ,b ,w ) for a horizontal line, (a ,b ,h ) for a vertical line, and (a ,b ,w ,h ) for a rectangle. Shape primitives can be used to compactly represent the textual contents. To encode pixels of text and graphics, a simple lossless coding algorithm is designed to utilize the information of the extracted shape primitives. Shape primitives can be efficiently encoded with a wavelet based SPIHT coding. The reason that we use JPEG instead of JPEG-2000 to encode pictorial pixels is; on one hand, as the algorithm

No Yes

Fig.4. Block diagram of GEC algorithm

16x16 block data

Color counting

Color count >T1

SPIHT

JPEG DWT

Picture pixels Text/graphics pixels

Refinement segmentation

Shape primitive extraction

Picture block Text/graphics block

Compressed bit stream

Compressed compound image

188

Fig.5. Text segmentation

is designed for real-time compression, speed is the primary consideration.DCT-based JPEG is several times faster than wavelet-based JPEG-2000. On the other hand, JPEG is a block-based algorithm, which matches with our block-based technique. The text/graphics pixels segmented are discrete wavelet transformed and encoded based on SPIHT-set partitioning in hierarchical trees. The encoding process takes place in the client. The encoded data is decoded in the server by SPIHT decoding and inverse wavelet transform. In this work, crucial parts of the coding process the way subsets of coefficients are partitioned and how the significant information is conveyed are fundamentally different from the aforementioned works. In the previous works, arithmetic coding of the bit streams was essential to compress the ordering information as conveyed by the results of the significance tests. Here the subset partitioning is so effective and the significance information so compact that binary uncoded transmission achieves about the same or better performance than previous works. Moreover, the utilization of arithmetic coding can reduce the mean squared error or increase the peak signal to noise ratio (PSNR) by 0.3 to 0.6 dB for the same rate or compressed file size and achieve results which are equal to or superior to any previously reported, regardless of complexity. Execution times are also reported to indicate the rapid speed of the encoding and decoding algorithms. The transmitted code or compressed image level is completely embedded, so that a single level image at a given code rate can be truncated at different points and decoded to give a series of reconstructed images at lower rates. Previous versions could not give their best performance with a single embedded file and required, for each rate, the optimization of a certain parameter. The new method solves this problem by hanging the transmission priority and yields, with one embedded file top performance for all rates. The encoding algorithms can be stopped at any compressed file size or let run until the compressed file is a representation of a nearly lossless image. We say nearly lossless because the compression may not be reversible, as the wavelet transforms filters, chosen for lossy coding, have non-integer tap weights and produce non-integer transform

coefficients, which are truncated to finite precision. For perfectly reversible compression, one must use an integer multiresolution transform, such as the S+P transform introduced in, which yields excellent reversible compression results when used with the new extended EZW techniques. In GEC the pictorial pixels in picture blocks are compressed using a simple JPEG coder. In order to reduce ringing artifacts and to achieve higher compression ratio, text/graphics pixels in the picture block are removed before the JPEG coding. These pixels are coded by lossless coding algorithm. Actually, their values can be arbitrarily chosen, but it would be better if these values to be quite similar to the neighbor pictorial pixels. This produces a smooth picture block. We, fill in these holes with the average color of pictorial pixels in the block.

Fig.6. Wavelet decomposition

Fig.7. Reconstructed image

V. CONCLUSION

189

We have presented an efficient PC screen compound image-coding scheme with very low complexity and high compression ratio for transmission of computer screen images. Two significant contributions are the segmentation to extract text and graphics, and a wavelet based lossless SPHIT coding algorithm. The advantages of our image coding scheme is, low complexity, high compression ratio, and visual lossless image quality. The algorithm has been implemented in both, Client and server, with the help of MATLAB and visual basic coding. The resultant reconstructed image showed significant reduction in size from 2.25 MB of the original compound image to 216 KB of the compressed compound image as shown in fig.7. Our future work is to implement the coding for real-time access of a remote desk top computer.

REFERENCES

[1] Amir Said, Faculty of Electrical Engineering, State University of Campinas Brazil, William A Pearlman Department of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute, Troy, NY, USA. “A New Fast and Efficient client Image Codec Based on Set Partitioning in Hierarchical Trees” IEEE Trans- on Circuits and Systems for Video Technology Vol 6 June 1996 [2] Nikola Sprljana, Sonja Grgicb, Mislav Grgicb a Multimedia and Vision Lab, Department of Electronic Engineering, Queen Mary, University of London, London E1 4NS, UK Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3/XII, HR-10000 Zagreb, Croatia “Modified SPIHT algorithm for wavelet packet image coding” Elsevier- Real-Time Imaging 11 (2005) 378–388 [3] H. Cheng and C. A. Bouman, “Document compression using rate-distortion optimized segmentation,” J. Electron. Imag., vol. 10, no. 2, pp. 460–474, Apr. 2001. [4] D. huttenlocher, P. Felzenszwalb, and W.Rucklidge, “DigiPaper: A versatile color document image representation,” in Proc. Int. Conf. Image Processing, vol. I, Oct. 1999, pp. 219–223. [5] J. Huang, Y. Wang, and E. K. Wong, “Check image compression using a layered coding method,” J. Electron. Imag., vol. 7, no. 3, pp. 426–442, Jul. 1998. [6] R. de Queiroz, Z. Fan, and T. D. Tran, “Optimizing block-thresholding segmentation for multilayer compression of compound images,” IEEE Trans. Image Process., vol. 9, pp. 1461–1471, Sep. 2000. [7] Tony Lin, Member, IEEE, and Pengwei Hao, Member, IEEE, “Compound Image Compression for Real-Time Computer Screen Image Transmission,” IEEE Trans - image processing, vol. 14, no. 8, august 2005 [8] H. Cheng, G. Feng, and C. A. Bouman, “Rate-distortion based segmentation for MRC compression,” in Proc. SPIE Color Imaging: Device- Independent Color, Color Hardcopy, and Applications, vol. 4663, San Jose, CA, Jan. 21–23, 2002. [9] R. de Queiroz, R. Buckley, and M. Xu, “Mixed raster content (MRC) model for compound image compression,” Proc. SPIE, vol. 3653, pp. 1106–1117, 1999.

[10] L. Bottou, P. Haffner, P. G. Howard, P. Simard, Y. Bengio, and Y. LeCun, “High quality document image compression with DjVu,” J. Electron Imag., vol. 7,

190

Abstract— Medical domain is one of the principal application domains for image classification. Medical image classification deals with classifying input medical images into a particular class to which it finds more similarity. This paper deals with classification of a query image into a one of the four classes namely brain, chest, breast and elbow using Hopfield Neural Network Classifier. Curse of dimensionality problem is solved by using extracted principal components of images as input to classifier. Finally, the results obtained using Hopfield neural classifier is compared with Back Propagation neural classifier.

Index Terms — Feature extraction, Hopfield Neural

Classifier, Principal components, Query image.

INTRODUCTION

The number of digitally produced medical images is rising

strongly. The management and the access to these large image repositories become increasingly complex. Classification of medical images is often cited as one of the principal application domains for content-based access technologies [1,4]. The goals of medical information systems have often been defined to deliver the needed information at the right time, the right place to the right persons in order to improve the quality and efficiency of care processes [2]. Such a goal will most likely need more than a query by patient name, series ID or study ID for images. For the clinical decision-making process it can be beneficial or even important to find other images of the same modality, the same anatomic region of the same disease. Clinical decision support techniques such as case-based reasoning or evidence-based medicine can even produce a stronger need to retrieve images that can be valuable for supporting certain diagnoses. Besides diagnostics, teaching and research especially are expected to improve through the use of visual access methods as visually interesting images can be chosen and can actually be found in the existing large repositories. In teaching it can help lecturers as well as students to browse educational image repositories and visually inspect the results found.

In image classification task a query image is given as an

input image and the image is classified into a particular class to which the query finds more similarity. Fig 1. shows the general block diagram of an image classification system. Currently, most medical image classification systems are similarity-based, where similarity between query and target

images in a database is measured by some form of distance metrics in feature space. In the image classification task, similarity measure technique is applied on the low-dimensional feature space. For this, a dimensionality reduction technique is used for dimension reduction and a classifier is used for online category prediction of query and database images [3].

Fig 1. Block Diagram of an Image Classification System

Principal component analysis (PCA) is the dimensionality

reduction technique employed, which reduces the dimensionality of the search to a basis set of prototype images that best describes the images.

Classifiers used in medical image classification problems are broadly classified as Parametric Classifiers and Non-parametric Classifiers. Gaussian Maximum Likelihood (GML) classifier and Statistical classifiers like Bayesian Networks or Hidden Markov Models (HMMs) comes under Parametric Classifiers. They make certain assumptions about the distribution of features. It is difficult to apply Parametric classifiers since the posterior probability are usually unknown. Neural Network classifier, k-Nearest Neighbor (k-NN) classifier, Decision Tree classifier, Knowledge Based classifier, Parzen Window classifier etc. comes under Non-parametric Classifiers. Non-parametric classifiers can be used with arbitrary feature distributions and with no assumptions about the forms of the underlying densities. Some systems often use measurement systems such as Euclidean vector space model [5] for measuring distances between a query image (represented by its features) and possible results representing all images as feature vectors in an n-dimensional vector space. Several other distance measures do exist for the vector space model such as the city-block distance, the Mahalanobis distance or a simple histogram intersection [5]. Another probabilistic retrieval form is the use of Support Vector Machines (SVMs) for a

Medical Image Classification using Hopfield Network and Principal Components

G.L Priya, ME, Applied Electronics, Karunya University, Coimbatore

Query Image

Images in database

Feature Extractio

Classification using Neural Network Model

191

classification of images into classes for relevant and non-relevant.

Neural Network Classifiers have proved to be robust in dealing with the ambiguous data and the kind of problems that require the interpolation of large amount of data. Neural networks explore many hypotheses simultaneously using massive parallelism. Neural networks have the potential for solving problems in which some inputs and corresponding output values are known, but the relationship between the inputs and outputs is not well understood to translate into a mathematical function. Various Neural Network classifiers are: Hopfield neural network, Back Propagation neural network etc. The time taken by the neural network classifier is very less as compared to that of k-NN classifier. K-NN classifiers are not very effective for high dimensional discrimination problems. The use of soft limiting functions in neural network classifier provide smoother boundaries between different classes and offers more flexible decision models than the conventional decision trees. Neural network classifier outperforms the other non-parametric classifiers. The rest of the paper is organized as follows: second section deals with obtaining the principal components, third section deals with classification using Hopfield Neural Classifier, forth section is the results obtained and finally the work is concluded in the fifth section.

FEATURE EXTRACTION

For medical image classification, the images in the

database are often represented as vectors in a high-dimensional space and a query is answered by classifying the image into a class with image vectors that are proximal to the query image in this space, under a suitable similarity metric. To overcome problems associated with high dimensionality, such as high storage and classification times, a dimension reduction step is usually applied to the vectors to concentrate relevant information in a small number of dimensions. Besides reducing storage requirements and improving query performance, dimension reduction has the added benefit of often removing noise from the data; as such noise is usually concentrated in the excluded dimensions [6].

Principal Component Analysis (PCA) is a well-known dimension reduction scheme. This approach condenses most of the information in a dataset into a small number, p, of dimensions by projecting the data (subtracted by the mean) onto a p-dimensional axis system. Consider an n dimensional image of size M N. Another matrix X of size n MN is formed by arranging the pixels values of the n dimensional image. The eigen feature vectors are extracted by finding the eigen vectors of the covariance matrix of data (subtracted by mean) using equations (1) (2) and (3).

K

Mean, M = (1/K)Xi (1)

i=1

K

Covariance = (1/ (K-1) )∑ (Xi - M) (Xi - M) T (2)

i=1

where K=MN. Principal components = A(X-M) (3) where the rows of matrix A are the normalized eigenvectors of covariance matrix. The eigen vectors corresponding to highest eigen values

form the principal components. The reduced dimensions are chosen in a way that captures essential features of the data with very little loss of information [7]. In this paper, a general medical image database is used containing brain, chest, breast and elbow images, each of size 124 124. Feature extraction is performed in these images using PCA so that the dimension is reduced to 1 9. This feature vector of size 1 9 is used to train a neural classifier.

NEURAL CLASSIFICATION

Hopfield Neural Classifier The Hopfield model is used as an auto associative memory

to store and recall a set of bitmap images. Images are stored by calculating a corresponding weight matrix. Thereafter, starting from an arbitrary configuration, the memory will settle on exactly that stored image. Thus given an incomplete or corrupted version of a stored image, the network is able to recall the corresponding original image [8]. For example, a fully trained network might give the three outputs (1,1,1,-1,-1,-1), (1,1,-1,-1,1,1) or (-1,1,-1,1,-1,1). If given the input (1,1,1,1,-1,-1) it would most likely give as output (1,1,1,-1,-1,-1) -- the first output -- since that is the pattern closest to the one that the network recognizes. Hopfield neural network is a type of Recurrent Neural Network in which a physical path exists from output of a neuron to input of all neurons except for the corresponding input neuron. It is also called as Iterative Auto associative Network [8]. Architecture of Hopfield Neural Classifier is shown in Fig 2. Conditions to be satisfied by weight values of a Hopfield Neural Classifier are:

1. Wij = Wji

which implies weight values should be diagonally symmetrical.

2. Wii = 0

that is all diagonal elements are zero.

Weight matrix is as shown below 0 W12 W13 . . . . . . . . . . W1n W12 0 W23 . . . . . . . . . . W2n W = XTX = W13 W23 0 . . . . . . . . . . . W3n (4)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W1n W2n W3n . . . . . . . . . . 0

where X is the input.

192

n2

nn

n1

n3

x1

x2

x3

xn

y1

y2

y3

yn

W3,2

Wn,2

W1,3

W2,3

Wn,3

W1,n

W2,n

W3,n

W2,1

W3,1

Wn,1

Fig 2. Architecture of Hopfield Neural Classifier

Energy Equation for Hopfield Neural Classifier:

E= -0.5* S*W*ST

(5)

where, E is the energy of a particular pattern (S) W is the weight value

Methodology for classification The extracted feature vector is used by Hopfield Neural

Network for classification. Since feature vector has a size of 1 9, neural network will have 9 neurons in its input layer. The principal components of all the images in the database are given to the neural classifier and the classifier is trained to these feature vectors. For the application of classification, the principal components of query image are found out using PCA and are given to classifier. Now the Hopfield Neural Classifier will get trained to these feature vectors.

At first, the weight matrix is calculated using equation (4).Then using the weight matrix energy is estimated for all the stored patterns and the test pattern using equation (5). The energy of the test pattern is compared with that of all the stored patterns. Finally, the test pattern is classified into a class which is having energy more close to the test pattern energy and corresponding class is displayed.

RESULTS AND DISCUSSIONS

In this paper, a general medical image database containing 5 images each of four classes namely brain, chest, breast and elbow is used. The database used is shown in Fig 3.

Fig 3. Database

The query given for classification and the corresponding classified image for each class are shown in Fig 4 - 7. It can be seen that when the first image from each class is given as query image, it is classified as corresponding class and corresponding image is displayed.

A. Class 1 – Brain

Fig 4. Class 1 - Query Image and Classified Image

B. Class 2 – Chest


193

C. Class 3 – Breast


D. Class 4 – Elbow


. The results obtained by using Hopfield Neural

Classifier are compared with that of Back Propagation Neural Classifier and it is found that performance of Hopfield is better. The results show that training time is lengthier for Back Propagation (16.4060sec) and it is less for Hopfield (2.2030sec). Practically, the error in BPN will not attain zero value. Error can take a minimum value and is called global minima. As BPN algorithm is slower and we need the result immediately, the training is stopped according to human perception (i.e. in this work training iterations is set as 50) and is called local minima. This leads to inaccurate results by BPN. Hopfield is 100% accurate and thus the results clearly show that Hopfield Neural Classifier outperforms Back Propagation Neural Classifier.

CONCLUSION

Image classification finds a lot of applications in the medical field. Survey of classifiers revealed that neural classifiers outperformed other parametric and non parametric classifiers. This paper dealt with classifying a query image into one of the classes in a general medical image database containing four classes namely brain, chest, breast and elbow

using Hopfield neural classifier. The experimental work proved that Hopfield neural classifier gives a better performance compared to other neural classifier.

REFERENCES

[1] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Transactions on Pattern Analysis and Machine Intelligence 22 No 12 1349-1380, 2000.

[2] A. Winter, R. Haux, “A three-level graph-based model for the management of hospital information systems,” Methods of Information in Medicine 34 378-396, 1995.

[3] Henning Muller, Nicolas Michoux, David Bandon, Antoine Geissbuhler, “A review of content-based image retrieval systems in medical applications - clinical benefits and future directions,” International Journal of Medical Informatics.,vol. 73, pp. 1 – 23, 2004.

[4] M. R. Ogiela, R. Tadeusiewicz, “Semantic-oriented syntactic algorithms for content recognition and understanding of images in medical databases,” in: Proceedings of the second International Conference on Multimedia and Exposition (ICME'2001), IEEE Computer Society, IEEE Computer Society, Tokyo, Japan, pp. 621-624, 2001.

[5] W. Niblack, R. Barber, W. Equitz, M. D.Flickner, E. H. Glasman, D. Petkovic, P. Yanker, C. Faloutsos, G. Taubin, QBIC project: querying images by content, using color, texture, and shape, in: W. Niblack (Ed.), Storage and Retrieval for Image and Video Databases, Vol. 1908 of SPIE Proceedings, pp. 173-187, 1993.

[6] J. Ye, R. Janardan, and Q. Li., “GPCA: An efficient dimension reduction scheme for image compression and retrieval,” in KDD ’04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 354–363, New York, NY, USA, 2004, ACM Press.

[7] U. Sinha, H. Kangarloo, “Principal component analysis for content-based image retrieval,” RadioGraphics 22 (5), pp. 1271-1289, 2002.

[8] S N Sivanandam , S Sumathi and S N Deepa, “Introduction to neural network using matlab 6.0”7) Ghazanfari A; Wulfsohn D; Irudayaraj J, 1998.

[9] Anne H.H. Ngu, Quan Z. Sheng, Du Q. Huy nh, Ron Lei “Combining multi-visual features for efficient indexing in a large image database,” The VLDB Journal 9: 279–293, 2001.

194

Delay Minimization of Sequential Circuits through Weight Replacement

S. Nireekshan Kumar1, Grace Jency Gnannamal2 PG Scholar of VLSI Design1, Lecturer2

Department of Electronics & Communication Engineering Karunya University, Coimbatore.

[email protected], [email protected]

Abstract- Optimizing sequential cycles is essential for many types of high-performance circuits, such as pipelines for packet processing. Retiming is a powerful technique for speeding pipelines, but it is stymied by tight sequential cycles. Designers usually attack such cycles by manually applying retiming& Shannon decompisition—effectively a form of speculation—but such manual application is error prone. This Paper proposes an efficient algorithm that applies retiming & Shannon decomposition algorithmically to optimize circuits with tight sequential cycles. Index Terms – Circuit Optimization, Circuit Synthesis, encoding, Sequential Logic Circuits.

IV. INTRODUCTION Every circuit has a sequence of cycles of operation. As the complexity of the circuit increases, these sequential cycles also increases. As these cycles increase, the performance of the circuit decreases. Hence it is essential to Optimize these sequential cycles. High-performance circuits rely on efficient pipelines. Provided additional latency is acceptable tight sequential cycles are the main limit to pipeline performance. Unfortunately, such cycles are fundamental to the function of many pipelines. Pipelining and parallel processing are two techniques to optimize the sequential cycles. Pipelining is chosen because of the drawback that parallel processing consume more area. Pipelining is a transformation technique that leads to a reduction in the critical path, and can be exploited to either increase the clock speed or sample speed or to reduce power consumption at same speed. In this Paper we propose a method by which we can decrease the delay or increase the frequency with the sacrifice of power consumption. Pipelining transformation leads to a reduction in the critical path, which can be exploited to either increase the clock speed or sample speed or to power consumption at same speed. In parallel processing, multiple outputs are computed in parallel in a clock period. Therefore, the effective sampling speed is

increased by the level of parallelism. Similar to the pipelining, parallel processing can also be used for reduction of power consumption. Consider the three-tap finite impulse response (FIR) digital filter Y(n) = ax(n) + bx(n-1) + cx(n-2). (1.1) The block diagram implementation of this filter is show in Fig.1.1. The critical path or the

Fig 1.1 Minimum time required for processing a new sample is limited by 1 multiply and 2 add time, i.e., if TM is the time taken for multiplication and TA is time needed for addition operation then the “sample period” (Tsample) is given by Tsample > TM +2 TA (1.2)

Therefore, the sampling frequency (fsample) (also referred to as the through put or the iteration rate) is given by

fsample < 1/ (TM +2 TA) (1.3) Note that the direct-form structure shown in fig.1.1. Can only be used when (1.2) is satisfied. But if some real-time application demands a faster input rate (sample rate) , then this structure cannot be used in that case, the effective critical path can be reduced by using either pipelining or parallel processing. Pipelining reduces the effective critical path by introducing pipelining latches along the datapaths. Pipelining has been used in the context of architecture design, and compiler synthesis, etc. Parallel processing increases the sampling rate by replicating hardware so that several inputs can be processed in parallel and several outputs can be produced at the

195

same time. Consider the simple structure in fig..2(a), where the computation time of the critical path is 2 TA fig,1.2 (b), shows the 2-level pipelined structure, where 1 latch is placed between the 2 adders and hence the critical path is reduced by half. Its 2-level parallel processing structure is shown in fig.1.2(c), where the same hardware is duplicated so that 2 inputs can be processed at the same time and 2 outputs are produced simultaneously therefore, the sample rate is increased by two.

PIPELINING OF FIR DIGITAL FILTERS Consider the pipelined implementation of 3-tap FIR filter of (1.1) obtained by introducing 2 additional latches as shown in fig.1.3 The critical path is now reduced from TM +2 TA to TM + TA. In this arrangement while the left adder initiates the computation of the current iteration the right adder is completing the computation of the previous iteration result.

Fig 1.2

One must note that in an M-level pipelined system. The number of delay elements in any path from in an M-level pipelined system, the number of delay elements in any path from input to output is (M-10 greater than that in the same path in the original sequential circuit. While pipelining reduces the critical path, it leads to a penalty in terms of an increase in latency. Latency essentially is the difference in the availability of the first output data in the pipelined system and the sequential system. For example if latency is 1 clock cycle then the k-th output is available in (k+1)-th clock cycle in a 1-stage pipelined system. The two main drawbacks of the pipelining are increase in the number of latches and in system latency The following points may be noted 1.The speed of architecture is limited by the longest path between any 2 latches or between an input and a latch or between a latch and an output or between the input and the output. 2. This longest path or the “critical path” can be reduced by suitably placing the pipelining latches in the architecture.

3. The pipelining latches can only be placed across any feed-forward cutest of the graph.

Retiming is a transformation technique used to change the location of delay elements in a circuit without affecting the input/output characteristics of the circuit. For example consider the fir filter in figure 2.1a. This filter is described by W (n) = ay (n-1) +by (n-2) Y (n) = w (n-1) + x (n) = ay (n-2) + by (n-3) + x(n) The filter fig. 2.1(b) is described by W1 (n) = ay (n-1) W2 (n) = by (n-2) Y (n) = w1 (n-1) + w2 (n-1) +x (n) = ay (n-2) + by (n-3) + x (n) All though the filters in fig.2.1 (a) and fig.2.1b have delays at different location, these filters have the same input/output characteristics. These 2 filters can be derived from one another using retiming. Retimng ahs many applications in synchronous circuit design. These applications include reducing the clock period of the circuit, reducing the number of registers in the circuit. Reducing the power consumption of the circuit, and logic synthesis

Retiming can be used to increase the clock rate of circuit by reducing the computational time of the critical path. Recall that the critical path is defined to be the path with the longest computational time. Among all paths that contain all zero delays, another computation time of the critical path is the lower bound on the clock period of the circuit. The critical path of the filter in the fi.2.1a passes through 1 multiplier and 1 adder and has a computation time of 3u.t ., so this filter can not be clocked with a clock period of less than 3.u.t. the retimed filter in fig.2.1b has a critical path that passes through 2 adders and has a computation time of 2.u.t., so thid filter can be clocked with a clock period of 2.u.t., by retiming the filter in fig.2.1.a to obtain the filter in fig.2.1.b the clock period has been reduced from 3 u.t. to 2 u.t., or by 33%.

Retiming can be used to decrease the number of register in a circuit. The filter in fig.2.1.a, uses 4 registers while the filter in fig.2.1b. Uses 5 registers. Since retiming can affect the clock period and the number of registers, it is sometimes desirable to take both of these parameters in to account.

V. PROPOSED METHODOLOGY A. Overview of Algorithm Procedure Restructure (S,c) Feasible arrival times (Bellman ford) If feasible arrival time computation failed then Return FAIL

196

Fig 2: Algorithm Our algorithm (Fig. 2) takes a network S and a timing constraint (a target clock period) c and uses resynthesis and retiming to produce a circuit with period c if one can be found, or returns failure. Our algorithm operates in three phases. In the first phase, “Bellman–Ford” (shown in Fig. 3 and described in detail in Section II), we consider all possible Shannon decompositions by considering different ways of restructuring each node. This procedure vaguely resembles technology mapping in that it considers replacing each gate with one taken from a library but does so in an iterative manner because it considers circuits with (sequential) loops. More precisely, the algorithm attempts to compute a set of feasible arrival times (FATs) for each signal in the circuit that indicate that the target clock period c can be achieved after resynthesis and retiming. If the smallest such c is desired, our algorithm is fast enough to be used as a test in a binary search that can approximate the lowest possible c. In the second phase “resynthesize,” as described in Section III), we use the results of this analysis to resynthesize the combinational gates in the network, which is nontrivial because to conserve area, we wish to avoid the use of the most aggressive (read: area-consuming) circuitry everywhere but on the critical paths. As we saw in the example in Fig. 1, the circuit generated after the second phase usually has worse performance than the original circuit. We apply classical retiming to the circuit in the third phase, which is guaranteed to produce a circuit with period c. In Section IV, we present experimental results that suggest that our algorithm is efficient and can produce a substantial speed improvement with a minimal area increase on half of the circuits we tried; our algorithm is unable to improve the other half. B. Retiming using Bellman ford algorithm

Fig 1.3 A Sequential Circuit 1. Let M = t max x n, where t max is the maximum computational time of the nodes in G and n is the number of nodes in G.Since t max = 2 and n=4, then M = 2 X 4 = 8.

2. Form a New Graph Gr which is the same as G except the edge weights are replaced by wr (e) = M X w (e) – t (U) for al edges U to V

Wr (1— 3) = 8 X 1 – 1 = 7 Wr (1— 4) = 8 X 2 – 1 = 15 Wr (3— 2) = 8 X 0 – 2 = -2 Wr (4— 2) = 8 X 0 – 2 = -2 Wr (2— 1) = 8 X 1 – 1 = 7

Fig 1.4: Restructured Sequential Circuit

3. Solve the all-pairs shortest path problem on Gr. Let S’ (U, V) be the shortest path from U to V. R (0) = inf inf 7 15 7 inf inf inf inf -2 inf inf inf -2 inf inf R (1) = inf inf 7 15 7 inf 14 22 inf -2 inf inf inf -2 inf inf R (2) = inf inf 7 15 7 inf 14 22 5 -2 12 20 5 -2 12 20 5 -2 12 20 R (3) = 12 5 7 15 7 12 14 22 5 -2 12 20 S’(U, V) = 12 5 7 15 7 12 14 22 5 -2 12 20 5 -2 12 20 4. To determine W (U, V) & D (U, V), where W(U, V) is the minimum number of registers on any path from node U to node V and D (U, V) is the maximum computation time among all paths from node U to node V with weight W (U, V).

197

If U = V then W (U, V) = 0 & D (U, V) = t (U). If U = V then W (U, V) = S’(U, V) / 8 & D (U, V) = M X W (U, V) – S’ (U, V) + t (V) W (U, V) = 0 1 1 2 1 0 2 3 1 0 0 3 1 0 2 0 D (U, V) = 1 4 3 3 2 1 4 4 4 3 2 6 4 3 6 2 5. The values of W (U, V) & D (U, V) are used to determine if there is a retiming solution that can achieve a desired clock period. Given a clock period ‘c’, there is a feasible retiming solution r such that Phi (Gr) < c if the following constraints hold. 1. (Feasibility constraints) r (U) – r (V) < w (e) for

every edge U to V of G. 2. (Critical path constraints) r (U) – r (V) < W (U,

V) – 1 for all vertices U, V in G such that D (U, V) > c.

The Feasibility constraints forces the number of delays on each edge in the retimed graph to be non negative and the critical path constraints enforces Phi (G) < c. if D (U, V) > c then W (U, V) + r (V) – r (U) > 1 must hold for the critical path to have computation time lesser that or equal to c. This leads to critical path constraints. If c is chosen to be 3, the inequalities r (U) – r(V) < w (e) for every edge U to V are r (1) – r (3) < 1 r (1) – r (4) < 2 r (2) – r (1) < 1 r (3) – r (2) < 0 r (4) – r (2) < 0 and inequalities r (U) – r (V) < W (U, V) – 1 for all vertices U, V in G such that D (U, V) > 3 r (1) – r (2) < 0 r (2) – r (3) < 1 r (2) – r (4) < 2 r (3) – r (1) < 0 r (3) – r (4) < 2 r (4) – r (1) < 0 r (4) – r (3) < 1

if there is a solution to the 12 inequalities above, then the solution is a feasible retiming solution such that the circuit can be clocked with period c = 5. The constraint graph is shown below which will not have any negative cycles.

Fig 1.5: Restructured circuit without negative cycles C. ISCAS 99 Sequential Benchmark Circuits The following are the sequential Benchmark circuits. These Circuits.

b01.blif b02.blif b03.blif b04.blif b05.blif

III. IMPLEMENTATION

The Benchmark circuits are synthesized and ran for Timing and power analysis. Later the Bellman ford algorithm is applied to the Benchmark circuits, synthesized and finally ran for Timing and power analysis. These two results are compared and tabulated in the results section.

IV. EXPERIMENTAL RESULTS The Time period & Frequency values of Benchmark circuits and lined benchmark circuits are compared and tabulated. The results show that the frequency of lined benchmark circuits is more than the raw benchmark circuits.

VI.CONCLUSION

In this Paper, Bellmanford algorithm is written in Mat lab and converted into VHDL using AccelDSP Synthesis tool. The Net lists of Benchmark circuits are taken and implemented in VHDL and the algorithm is linked with the benchmark circuits. The results show that the frequency is increased when the algorithm is applied.

198

SL N0

Benchmark Circuits

Time Period before applying

algorithm

TimePeriod after applying Algorithm

1 B01 2.489 ns 401.99 MHZ

1.103ns 884.12 MHZ

2 B02 1.657 ns 603.500 MHZ

1.012 ns 889.52 MHZ

3 B04 9.132 ns 109.505 MHZ

3.203 ns 512.023 MHZ

SL N0

Benchmark Circuits

Power before applying algorithm

Power after applying Algorithm

1 B01 615 mW 879 mW 2 B02 712 mW 890 mW 3 B04 412 mW 653 mW

REFERENCES

[1] C. E. Leiserson and J. B. Saxe, “Retiming synchronous circuitry,” Algorithmica, vol. 6, no. 1, pp. 5–35, 1991. [2] P. Pan, “Performance-driven integration of retiming and resynthesis,”in Proc. DAC, 1999, pp. 243–246. [3] E. Lehman, Y.Watanabe, J. Grodstein, and H. Harkness, “Logic decomposition during technology mapping,” IEEE Trans. Comput.-Aided Design Integr. CircuitsSys t., vol. 16, no. 8, pp. 813–834, Aug. 1997. [4] K. J. Singh, A. R. Wang, R. K. Brayton, and A. L. Sangiovanni-Vincentelli, “Timing optimization of combinational logic,” in Proc. ICCAD, 1988, pp. 282–285. [5] C. L. Berman, D. J. Hathaway, A. S. LaPaugh, and L. Trevillyan, “Efficient techniques for timing correction,” in Proc. ISCAS, 1990, pp. 415–419. [6] P. C. McGeer, R. K. Brayton, A. L. Sangiovanni-Vincentelli, and S. K. Sahni, “Performance enhancement through the generalized bypass transform,” in Proc. ICCAD, 1991, pp. 184–17. [7] A. Saldanha, H. Harkness, P. C. McGeer, R. K. Brayton, and A. L. Sangiovanni-Vincentelli, “Performance optimization using exact sensitization,” in Proc. DAC, 1994, pp. 425–429.

[8] K. J. Singh, “Performance optimization of digital circuits,” Ph.D. dissertation, Univ. California, Berkeley, CA, 1992.

199

Analysis of MAC Protocol for Wireless Sensor Network Jeeba P.Thomas, Mrs.M.Nesasudha,

ME Applied Electronics student, Sr. Lecturer Department of Electronics & Communication Engineering,

Karunya University, Coimbatore [email protected]

Abstract--Wireless sensor networks use battery-operated computing and sensing devices. It is expected that the sensor networks to be deployed in an ad hoc fashion, with individual nodes remaining largely inactive for long periods of time, but then becoming suddenly active when something is detected. As a result the energy consumption will be more in existing MAC protocols. The paper to be aimed in designing a new MAC protocol designed explicitly for wireless sensor networks with reducing energy consumption should be the main aim. This has to be trying to implement in 3 phases. The phase that completed is the analysis of IEEE 802.11 MAC protocol .Simulator tool to be used in the design was NS2.29.The NAM file and Trace file of the MAC had to be obtained as the result of implementation. Keywords—Energy efficiency, medium access control, wireless sensor networks

I. INTRODUCTION The term wireless network may technically be used to refer to any type of network that is wireless, the term is most commonly used to refer to a Telecommunication network whose inter connection between nodes is implemented Wireless networks can be classified into infrastructure-based networks and ad hoc networks. Infrastructure-based networks have a centralized base station. Hosts in the wireless network communicate with each other, and with other hosts on the wired network, through the base station. Infrastructure-based networks are commonly used to provide wireless network services in numerous environments such as college campuses, airports, homes, etc. Ad hoc networks are characterized by the absence of any infrastructure support. Hosts in the network are self-organized and forward packets on behalf of each other, enabling communication over multi-hop routes. Ad-hoc networks are envisaged for use in battle field communication, sensor communication, etc. IEEE 802.11 is a MAC layer protocol that can be used in infrastructure-based networks as well as in ad hoc networks. WIRELESS sensor networking is an emerging technology that has a wide range of

potential applications including environment monitoring, smart spaces, medical systems and robotic exploration. Such a network normally consists of a large number of distributed nodes that organize themselves into a multi-hop wireless network. Each node has one or more sensors, embedded processors and low-power radios, and is battery operated. Typically, these nodes coordinate to perform a common task. Like in all shared-medium networks, medium access control (MAC) is an important technique that enables the successful operation of the network. One fundamental task of the MAC protocol is to avoid collisions so that two interfering nodes do not transmit at the same time. There are many MAC protocols that have been developed for wireless voice and data communication networks. Time division multiple access (TDMA), frequency division multiple access (FDMA) and code division multiple access (CDMA) are MAC protocols that are widely used in modern cellular communication systems. Their basic idea is to avoid interference by scheduling nodes onto different sub-channels that are divided either by time, frequency or orthogonal codes. Since these sub-channels do not interfere with each other, MAC protocols in this group are largely collision-free. These are referred as scheduled protocols. Another class of MAC protocols is based on contention. Rather than pre-allocate transmissions, nodes compete for a shared channel, resulting in probabilistic coordination. Collision happens during the contention procedure in such systems To design a good MAC protocol for the wireless sensor networks, there are some attributes. The first is the energy efficiency. Sensor nodes are battery powered, and it is often very difficult to change or recharge batteries for these nodes. Prolonging network lifetime for these nodes is a critical issue. Another important attribute is the scalability to the change in network size, node density and topology. Some nodes may die over time; some new nodes may join later; some nodes may move to different locations. The network topology changes over time as well due to many reasons. A good MAC protocol should easily accommodate to network changes. Other important attributes include fairness, latency, and throughput and bandwidth utilization. These attributes are generally the primary concerns in traditional

200

wireless voice and data networks, but in sensor networks they are secondary. The following are the major sources of energy waste. The first one is collision. When a transmitted packet is corrupted it has to be discarded, and the follow-on retransmissions increase energy consumption. Collision increases latency as well. The second source is overhearing, meaning that a node picks up packets that are destined to other nodes. The third source is control packet overhead. Sending and receiving control packets consumes energy too, and less useful data packets can be transmitted. The last major source of inefficiency is idle listening, i.e., listening to receive possible traffic that is not sent. The aim here is to design a new MAC protocol explicitly designed for wireless sensor networks. While reducing energy consumption is the primary goal in this design. To achieve the primary goal of energy efficiency, for that it is needed to identify what are the main sources that cause inefficient use of energy as well as what trade-offs can make to reduce energy consumption. The new MAC tries to reduce the waste of energy wastage that occurs from existing protocols. Therefore new MAC lets its nodes periodically sleep thus avoiding idle listening. In the sleep mode, a node will turn off its radio. The design reduces the energy consumption due to idle listening.

II. PROTOCOL DESIGN

The purpose of implementation is to demonstrate the effectiveness of the new MAC protocol and to compare new protocol with 802.11 & TDMA .The steps to be followed in this implementation are 1 .Study of existing protocols (IEEE 802.11 & TDMA) 2. Design of new MAC protocol 3. Comparing existing MAC protocols with new MAC protocol. A. SIMULATOR

Simulator using for the purpose of implementing new protocol is Network simulator (version 2). NS (Version-2) is an object oriented, discrete event simulator, developed under the VINT project as a joint effort by UC Berkeley, USC/ISI, LBL, and Xerox PARC. It was written in C++ with OTcl as a front-end. The simulator supports a class hierarchy in C++ (compiled hierarchy), and a similar class hierarchy within the OTcl interpreter (interpreted hierarchy). The network simulator uses two languages because simulator has two different kinds of things it needs to do. On one hand, detailed simulations of protocols require a systems programming

language which can efficiently manipulate bytes, packet headers, and implement algorithms that run over large data sets. For these tasks run-time speed is important and turn-around time (run simulation, find bug, fix bug, recompile, re-run) is less important. On the other hand, a large part of network research involves slightly varying parameters or configurations, or quickly exploring a number of scenarios. In these cases, iteration time (change the model and re-run) is more important. Since configuration runs once (at the beginning of the simulation), run-time of this part of the task is less important. ns meets both of these needs with two languages, C++ and OTcl. C++ is fast to run but slower to change, making it suitable for detailed protocol implementation. OTcl runs much slower but can be changed very quickly (and interactively), making it ideal for simulation configuration. ns (via tclcl) provides glue to make objects and variables appear on both languages. The tcl interface can be used in cases where small changes in the scenarios are easily implemented. The simulator is initialized using the TCL interface. The energy model can be implemented pretty simply in NS-2. After every packet transmission or reception the energy content is decreased. The time taken to transmit or receive along with the power consumed for transmission or reception of a bit/byte of data is passes as parameters to the functions .And these functions would thus decrease the energy content of the node. B. Analysis of IEEE 802.11 In the IEEE 802.11MAC layer protocol, the basic access method is the Distributed Coordination Function which is based on the CSMA/CA. DCF is designed for ad hoc networks, while the point coordination function (PCF, or infrastructure mode) adds support where designated access points (or base-stations) manage wireless communication. IEEE 802.11 adopted all these features of CSMA/CA, MACA and MACAW in its distributed coordination function. Among contention based protocols, the 802.11 does a very good job of collision avoidance. Here the analysis of the IEEE 802.11 protocol has to be conducted. The methodologies to be followed for the analysis are

• Identifying sensor nodes (nos. 10) • Giving energy model to each node • Analyzing nodes by transmitting and

receiving packets.

Here in the analysis simulator used is NS 2.29, a network simulator tool. The first step to be

201

followed is to identify the nodes as it is to be assigned as 10. Then the energy model has to be given for each node. The transmission has to be taken place in a random manner from first node to the last node .Here when the simulation happened two files are getting .They are NAM Trace file and Trce file .From the NAM file the topology of the design has to be visible and the Trce file gives the events occurred during transmission.

III. RESULT

The result of the analysis has to be obtained in the form of two files namely NAM Trace file and Trace file.

NAM file:

This is the network animator file which contained the information about the topology ie nodes , links, as well as packet traces. Here the obtained NAM file shows that there are 10 nodes to be identified for packet transmission .Simulation start time has to be given as 1sec and the stop time is 20sec. As the time bar keep on moving the data transmission will be visible .This blue circles shown the data transmission from one node to other node.

Fig. 1. NAM file

Trace file:

The Trace file contains information about various events that has taken place during the simulation. Here the trace file obtained below shows the consumption of energy when there is transmission taken place from one node to another. Here a graph has to be obtained which is having

energy (in Joules) in Y axis and period in X axis. The graph clearly specifies about the decrease in energy as the transmission progresses.

Fig. 2. Trace file

IV. CONCLUSION AND FUTURE WORKS

The paper has to be aimed at designing an energy efficient MAC protocol. The first phase of the work has only being implemented ie. the analysis of IEEE 802.11 protocol. The analysis had to be done using the network simulator (NS) version 2.29. The result obtained shows the topology file and the graph file. The graph file clearly mentioned about the energy consumption. As the period increases the energy consumption is more. The future works include the analysis of another existing MAC protocol (TDMA) and the design of a new MAC protocol which has energy conservation as the primary goal.

REFERENCES [1]. Wei Ye,John Heidemann and Deborah Estrin

“An energyefficient mac protocol for wireless sensor networks,” in Proceedings of the IEEE Infocom , New York ,NY ,June 2002 ,pp. 1567-1576.

[2] T.S. Rappaport , “Wireless Communications ,Principles and Practice ,” Prentice Hall ,1996.

[3] LAN MAN Standards Committee of the IEEE Computer Society, Wireless LAN medium access control (MAC) and physical layer specification, IEEE, New York, NY, IEEE Std 802.11-1997 edition, 1997.

[4] Gregory J.Pottie and William J.Kaiser,, “Embedding the internet: wireless integrated network sensors,” Communications of the ACM ,vol. 43, no.5,pp.51-58,May 2000.

202

[5] Mark Stemm and Randy H Katz , “ Measuring and reducing energy consumption of the network interfaces I hand-held devices,” IEICE Transactions on Communications, vol. E80-B, no . 8 ,pp.1125-1131 , Aug. 1997.

[6] Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, and Kristofer Pister, “System architecture directions for networked sensors,” in Proceedings of the 9th International Conference on Architectural Support for Programming Languages and operating systems, Cambridge, MA, USA, Nov. 2000, pp.93-104, ACM

203

Improving Security and Efficiency in WSN Using Pattern Codes

Anu jyothy,Student ,ME (Applied Electronics) Mrs M.Nesasudha,Sr Lecturer, Department of ECE

Karunya University, Coimbatore [email protected]

ABSTRACT: Wireless sensor networks are undoubtedly one of the largest growing types of networks today. Wireless sensor networks are fast becoming one of the largest growing types of networks today and, as such, have attracted quite a bit of research interest. They are used in many aspects of our lives including environmental analysis and monitoring, battlefield surveillance and management etc. Their reliability, cost-effectiveness, ease of deployment and ability to operate in an unattended environment, among other Positive characteristics make sensor networks the leading choice of networks for these applications. Much research has been done to make these networks operate more efficiently including the application of data aggregation. Recently, more research has been done on the security of wireless sensor networks using data aggregation. Here pattern generation for data aggregation is performed securely by allowing a sensor network to aggregate encrypted data without first decrypting it. In this pattern generation process, initially when a sensor node senses an event from the environment, a pattern code is generated and sends to the cluster head. This generated pattern code is needed for further processes like comparing with the existing pattern code in the cluster head and then receive the acknowledgement, so that authentication is done and the actual data can be sent. The simulator used for the implementation is GloMoSim Network Simulator (Global Mobile Information Systems Simulation Library). This is more efficient due to aggregated data transmission, secure and bandwidth efficient. Keywords - Wireless sensor networks, Security, Pattern codes, pattern generation and comparison

I. INTRODUCTION The primary function of a wireless sensor network is to determine the state of the environment being monitored by sensing some physical event. Wireless sensor networks consist of hundreds or thousands or, in some cases, even millions of sensor devices that have limited amounts of processing power, computational abilities and memory and are linked together through some wireless transmission medium such as radio and infrared media. These sensors are equipped with sensing and data collection capabilities and are responsible for collecting and transmitting data back to the observer of the event. Sensors may be distributed randomly and may be installed in fixed locations or they may be mobile. For example, dropping them from

an aircraft as it flies over the environment to be monitored may deploy them. Once distributed, they may either remain in the locations in which they landed or they may begin to move if necessary. Sensor networks are dynamic because of the addition and removal of sensors due to device failure in addition to mobility issues. Security in wireless sensor networks is a major challenge. The limited amount of processing power, computational abilities and memory with which each sensor device is equipped makes security a difficult problem to solve. The GlomoSim network simulator (Global Mobile Information Systems Simulation Library) is the simulator used which is a scalable simulation environment for large wireless and wired line communication networks. GloMoSim uses a parallel discrete-event simulation capability provided by Parsec.GloMoSim simulates networks with up to thousand nodes linked by a heterogeneous communications capability that includes multicast, asymmetric communications using direct satellite broadcasts, multi-hop wireless communications using ad-hoc networking, and traditional Internet protocols.

II. .USE OF GLOMOSIM SIMULATOR

After successfully installing GloMoSim, a simulation can be started by executing the following command in the BIN subdirectory. /glomosim < inputfile > The <inputfile> contains the configuration parameters for the simulation (an example of such file is CONFIG.IN). A file called GLOMO.STAT is produced at the end of the simulation and contains all the statistics generated. GloMoSim has a Visualization Tool that is platform independent because it is coded in Java. To initialize the Visualization Tool, we must execute from the java-gui directory the following: java GlomoMain. This tool allows to debug and verify models and scenarios; stop, resume and step execution; show packet transmissions, show mobility groups in different colors and show statistics. The radio layer is displayed in the Visualization Tool as follows: When a node transmits a packet, a yellow link is drawn from this node to all nodes within its power range. As each node receives the packet, the link is erased and a green line is drawn for successful reception and a red line is drawn for unsuccessful reception. GloMoSim requires a C compiler to run and works with most C/C++ compilers on many common platforms

204

Fig 2.1The Visualization Tool

III. .ALGORITHM:Pattern Generation (PG)

Input: Sensor reading D,Data parameters being sensed. Output: Pattern-code (PC) •Sensing data from the environment. •Defining intervals from threshold values set for the environment parameters. •Assigning critical values for intervals using pattern seed from cluster-head. •Generating the lookup table. •Generating pattern codes using pattern generation Algorithm. •Sending pattern codes to cluster-heads This explains how PG algorithm generates a pattern code. Let D (d1, d2, d3) denote the sensed data with three parameters d1, d2, and d3 representing temperature, pressure and humidity respectively in a given environment. Each parameter sensed is assumed to have threshold values between the ranges 0 to 100 as shown in Table 1. The pattern generation algorithm performs the following steps

Pattern code to be generated is initialized to empty pattern code

The algorithm iterates over sensor reading values for parameters of data that are being sensed. In this case, it first considers temperature

Temperature parameter is extracted from sensor reading D

For the temperature parameter, the algorithm first checks whether a new pattern seed is received from the cluster-head Arrival of a seed refreshes the mapping of critical values to data intervals. As an example, the configuration in Table 1.

The data interval that contains the sensed temperature is found from the interval table.

Then, from the interval value, corresponding critical value is determined from critical value table. Table 2. Shows the critical values for different sensor readings if

the same lookup tables of Table 1. Are used for temperature, pressure and humidity.

PC is set to the new critical value found .For the pressure and humidity; corresponding critical values are appended to the end of partially formed PC.

Previous steps are applied for the pressure and humidity readings

When full pattern code is generated, timestamp and sensor identifier is sent with the pattern code to the cluster-head.

3.1Pattern Generation

Threshold values 30 50 70 80 90 95 100

Interval values

0-30

31-50

51-70

71-80

81-90

91-95

96-100

Critical values

5 3 7 8 1 4 3

Table1:Look Up Table For Data Intervals And Critical Values

Table:2 Pattern Code Generation Table Pattern codes with the same value are referred as a redundant set. In this example, data sensed by sensor 1 and sensor 3 are same with each other as determined from the comparison of their pattern code values (pattern code value 747) and they for the Redundant Set #1. Similarly, data sensed by sensor 2, sensor 4 and sensor 5 are the same (pattern code value 755), Redundant Set #2. The cluster-head selects only sensor from each redundant set (sensor 1 and sensor 5 in this example) to transmit the actual data of that redundant set based on the timestamps.

IV. .ALGORITHM: PATTERN COMPARISON The cluster-head runs the pattern comparison algorithm to eliminate the redundant pattern codes resulting in prevention of redundant data transmission. Cluster heads choose a sensor node for each distinct pattern code to send corresponding data of that pattern code,

205

and then chosen sensor nodes send the data in encrypted form to the base station over the cluster-head. In pattern comparison algorithm, upon receiving all of the pattern codes from sensor nodes in a period of T, cluster-head classifies all he codes based on redundancy. While this increases the computation overhead at the sending and receiving nodes, due to the significant energy consumption difference between the computation and communication, the overall gains achieved by transmitting smaller number of pattern code bits overcomes the computational energy required at either ends. 4.1 ALGORITHM:PATTERN COMPARISON Input: Pattern codes Output: Request sensor nodes in the selected-set to send actual encrypted data. Begin 1. Broadcast ‘current-seed’ to all sensor nodes 2. while (current-seed is not expired) 3. time-counter = 0 4. while (time-counter < T) 5. get pattern code, sensor ID, timestamp 6. endwhile 7. Compare and classify pattern codes based on redundancy to form ‘classified-set’. 8. selected-set=one pattern code from each classified-set 9. deselected-set = classified-set – selected-set 10. if (sensor node is in selected-set) 11. Request sensor node to send actual data 12. endif 13. endwhile End Once pattern codes are generated, they are transmitted to cluster-head using the following algorithm SDT (session data transmission). SDT is implemented in every session of data transmission, where session refers to the time interval from the moment the communication is established between a sensor node and the cluster-head until the communication terminates. Each session is expected to have a large number of packets. In the beginning of each session, cluster-head receives the reference data along with the first packet and stores it until the end of the session. After a session is over, cluster-head can remove its referenced data. 4.2 ALGORITHM: SDT Begin For each session, T While (sensor node has pattern-codes or data packets for transmission) if ( first pattern-codes or data packet of session)

Send the pattern-codes or data packet along with the reference data else send the differential pattern-codes or data packets endif endWhile end Choosing Sensor Nodes for Data Transmission by Cluster heads .The technique of using lookup tables and pattern seed ensures that the sensed data cannot be re-generated from the pattern codes, which in turn help the sensor nodes to send pattern codes without encryption. Only sensor nodes within the cluster know the pattern seed, which ensures the security of the sensed data during the data aggregation. 4.3:Differential Data Transmission from Sensor Nodes to Cluster head After the cluster-head identifies which sensor nodes should send their data to the base station, those nodes can send their differential data to the base station. The differential data is securely sent to the base station using the security protocol described in this section. If T is the total number of packets that sensor nodes want to transmit in a session, and R as the number of distinct packets, where R less than or equal to T.Ususally, the cluster-head receives all data packets prior to eliminating redundant data, the total number of packets transmitted from sensor nodes to cluster-head would be T. After eliminating redundancy the cluster-head sends R packets to base station. Therefore, the total number of packets transmitted from sensor nodes to base station is (T+R). But in this secure data aggregation using pattern codes; cluster-head receives T pattern codes from all sensor nodes. After eliminating redundancy based on pattern codes, cluster requests selected sensor nodes to transmit their data. Since selected nodes are the nodes that have distinct packets, the total number of packets transmitted from sensor nodes to cluster-head would be R which is later transmitted to base station. Therefore, the total number of packets transmitted from sensor nodes to base station is (2R). To assess the energy efficiency, we use a GloMoSim network simulator that simulates the transmission of data and pattern codes from sensor nodes to cluster-head. The pattern code generation and transmission requires negligible amount of energy as the algorithm is not complex.

V. SIMULATION RESULTS Redundant set #1 : sensor nodes (1,3) Redundant set #1 :sensor nodes (2,4,5) Selected unique set : sensor nodes(1,5)

206

ATED PATTERN CODE

5.2:SELECTED UNIQUE SET OF PATTERN CODES

5.3:SENSED DATA VALUES

VI. : CONCLUSION Sensor nodes receive the secret pattern seed from the cluster head. The interval values for the data are defined, based on the given threshold values set for each environment parameter. The number of threshold values and the variation of intervals may depend on the user requirement and the precision defined for the given environment in which the network is deployed. The algorithm then computes the critical values for each interval using the pattern seed to generate the lookup table, where the pattern seed is a random number generated and broadcasted by the cluster-head. Pattern Generation (PG) algorithm first maps the sensor data to a set of numbers. Then, based on the user requirements and precision defined for the environment in which the network is deployed, this set of numbers is divided into intervals such that the boundaries and width of intervals are determined by the predefined threshold values. PG algorithm then computes the critical values for each interval using the pattern seed and generates the interval and critical value lookup tables. The interval lookup table defines the range of each interval and the critical value lookup table maps each interval to a critical value. Upon sensing data from environment, the sensor node compares the characteristics of data with the intervals defined in the lookup table of PG algorithm. Then, a corresponding critical value is assigned to each parameter of the data; concatenation of these critical values forms the pattern code of that particular data. Before pattern code transmitted to the cluster-head the time stamp and the sender sensor ID are appended to end of the pattern code. The cluster-head runs the pattern comparison algorithm to eliminate the redundant pattern codes resulting in prevention of redundant data transmission. Cluster-heads choose a sensor node for each distinct pattern code to send corresponding data of that pattern code, and then chosen sensor nodes send the data in encrypted form to the base station over the cluster-head.

REFERENCES

[1] H. Çam, S. Özdemir, Prashant Nair, and D. Muthuavinashiappan, “ESPDA: energy efficient and secure pattern-based data aggregation for wireless sensor networks,'' Proc. of IEEE Sensors - The Second IEEE Conference on Sensors, Oct. 22-24, 2005, Toronto, Canada, pp. 732-736. [2] W. Ye, J. Heidemann, and D. Estrin, “An Energy- Efficient MAC Protocol for Wireless Sensor Networks”, Proc. of INFOCOM 2002, vol. 3, pp. 1567-1576, June 2002. [3] A. Sinha and A. Chandrakasan, “Dynamic power management in wireless sensor networks”, IEEE Design and Test of Computers, vol. 18(2), pp. 62-74, March- April 2006.

207

[4] A. Perrig, R. Szewczyk, J.D. Tygar, V. Wen, and D.E. Culler, “SPINS: Security protocols for sensor network”, Wireless Networks, vol. 8, no. 5, pp. 521-534, 2002. [5] C. Intanagonwiwat, D. Estrin, R. Govindan, and J. Heidemann, “Impact of network density on Data Aggregation in wireless sensor networks”, Proc. of the 22nd International Conference on Distributed Computing Systems, pp. 575-578, July 2002. [6] H. Çam, S. Özdemir, D. Muthuavinashiappan, and Prashant Nair, “Energy-Efficient security protocol for Wireless Sensor Networks”, IEEE VTC Fall 2003 Conference, October 2003, Orlando, Florida. [7] X. Zeng, R. Bagrodia, and M. Gerla, “GloMoSim: A Library for Parallel Simulation of Large-scale Wireless Networks”, Proc. of the 12th Workshop on Parallel and Distributed Simulations, PADS'98, May 1998, Banff, Alberta, Canada.

NCVCCC’08

208

Abstract-This paper presents an automotive inspection of printed circuit boards with the help of genetic algorithm.The algorithm contains important operators like selection,crossover and mutation.This project presents a novel integrated system in which a number of image processing algorithm are embedded within a genetic algorithm(GA) based framework which provides less computational complexity and better quality. A specially tailored hybrid GA (HGA) is used to estimate geometric transformation of arbitrarily placed Printed Circuit Boards (PCBs) on a conveyor belt without any prior information such as CAD data. Some functions like fixed multi-thresholding,Sobeledge-detection, image subtraction and noise filters are used for edge-detection and thresholding in order to increase defect detection accuracy with low computational time. Our simulations on real PCB images demonstrate that the HGA is robust enough to detect any missing components and cut solder joint with any size and shape. Key-Terms: elitist, rotation angle, hybrid genetic algorithm.

I.INTRODUCTION reviously, GA has used to find misorientation parameter

values of individual Integrated Circuits (IC) on board to determine the board has no defects and implemented the technique on System-On-Chip (SOC) platform. It also has used GA to estimate surface displacements and strains for autonomous inspection of structures. GA and distance transform has been combined in object recognition in a complex noisy environment. This research shows the combination has produced fast and accurate matching and has scaling and rotation consistency. Feature selection and creation in two pattern classification are also a difficult problem in inspection process. Therefore, has used GA to solve this problem and successfully reduced classifications error rate but it requires much more computation than neural net and neighbor classifiers. The proposed technique uses a perfect board to act as reference image and the inspected board as the test image. In this work, GA is used to derive the transformation between test and reference images based on the simple GA as presented in order to find out the board is good or faulty. It is essential to determine the type of encoding and the fitness function which

will be used in the GA to optimize the parameters. Many encoding schemes have been proposed, for example, integer coding and gray coding. There is no standard way to choose these schemes and the choice really depends on the nature expression of the problems. In this work, binary coding has been chosen since it is straight-forward and suitable for this problem. Nine bits are allocated for rotation with value from 0 to 360 degree, five bits allocated for displacement of x-axis with value between -10 to 10 pixels and another five for displacement of y-axis with value between -10 to 10 pixels. The fitness value is created to evaluate the individual. The fitness function in this work is evaluated from total similarities values in each pixel between test image and reference image divided by total pixels in reference image assuming that both images are the same size. In elitism strategy of this work, deterministic, tournament and roulette-wheel selection methods are implemented. Four samples of artificially transformed and defected test image have been compared to the reference image using these selection methods to evaluate the performance in terms of maximum fitness, accuracy and computing time. This investigation aims to develop a better understanding of their capabilities to improve the strength of existing GA framework in finding the optimum solution. Reference Sample:

Fig 1: a) Image of reference board Defected Samples:

Fig1: b) Test image 1 (T1). Image is rotated 329 degrees anti-clockwise, displacement at x-axis is 0 pixel and displacement at y-axis is 0 pixel

P

Automatic Hybrid Genetic Algorithm Based Printed Circuit Board Inspection

1 Mridula 2 kavitha 3 Priscilla 1,2 ,3 Second year students

Adhiyamaan college of Engineering Hosur-635 109

Email 1:[email protected] Email 2: [email protected] Mobile:9355894060 ,9787423619

NCVCCC’08

209

Fig1: c) Test image (T2). Image is rotated 269 degrees anti-clockwise, displacement at x-axis is 8 pixels and displacement at y-axis is 8 pixels. The paper is organized as follows: Section 2 will discuss the integration between HGA module and defect detection procedure, details on simulation environment are presented in Section 3 while Section 4 concludes the work based on performance

II. INTEGRATION SYSTEM The integration between image registration module and defect detection procedure has been performed as shown in Figure 1. Fixed multi-threshold operation is applied to a stored reference image and an image of a PCB under test (test image) to enhance the images and highlight the details before performing image registration. The threshold operation is also essential to deal with variations in intensity of components on PCB images. The image registration module employs hybrid GA (HGA) which contains specially tailored GA [6] with elitism and hill-climbing operation as local optimization agent. The transformation parameters found by HGA will be passed to defect detection procedure for next image processing operations. The test image will be transformed using these transformation parameters and Sobel edge-detection is applied to the transformed image while the reference image is thresholded by multi-thresholding function. Then, image Subtraction is performed on both processed images and noise in the output image is filtered using window-filtering and median-filtering. The final image produced by the system is known as defect detected image which contains information of possible defects for decision making to reject or accept the inspected board. GP algorithmic analysis: One of the common ones is the so-called roulette wheel selection, which can be implemented as follows: The fitness function is evaluated for each individual, providing fitness values, which are then normalized. Normalization means multiplying the fitness value of each individual by a fixed number, so that the sum of all fitness values equals

1. The population is sorted by descending fitness values.

2. Accumulated normalized fitness values are computed (the accumulated fitness value of an individual is the sum of its own fitness value plus

the fitness values of all the previous individuals). The accumulated fitness of the last individual should of course be 1 (otherwise something went wrong in the normalization step!).

3. A random number R between 0 and 1 is chosen. 4. The selected individual is the first one whose

accumulated normalized value is greater than R. There are other selection algorithms that do not consider all individuals for selection, but only those with a fitness value that is higher than a given (arbitrary) constant. Other algorithms select from a restricted pool where only a certain percentage of the individuals are allowed, based on fitness value . Hybrid Genetic Algorithm: In every test board inspection, a different random geometric transformation was applied to the reference image and agreement between the registered references image and the test image are measured. The transformations of reference image will create the initial population for HGA with measurement of matched pixel as fitness values. The fitness value may range from 0 to 1.0, when the ideal solution is found. The fitness value is defined as: if f(xa, ya) = = g(xb, yb), counter ++ Therefore fitness = counter/(W × H) Where f(xa, ya) is pixel intensity of reference image, g(xb, yb) is pixel intensity of test image, in condition of xa = xb, ya = yb where x and y are pixel location at x-axis and y-axis. W and H are the width and height of the reference image respectively since both compared images are the same size. Iteratively the whole population for the next generation is formed from selected individuals from the previous and the present generations. These individuals are ranked based on their fitness performance. These operations is done by means of GA (selection crossover, mutation) For hill-climbing process [7], which exploits the best solution for possible improvement, a limit for the generation, l is set for every set of GA search. This limit is the number of times the same individual is recognized as the fittest individual. Hill-climbing will be performed if the limit is reached. The fittest individual of the current individual will be selected for this process where every transformation values (rotation, x and y displacement) will be incremented and decremented by a single unit sequentially. The modifications will be evaluated to examine the fitness value which may replace the current solution. The GA search will be terminated with the current solution unless a better individual is found during hill-climbing. If the search is continued, the hill-climbing process will be repeated when the limit is reached again.

NCVCCC’08

210

Fig2: Flow of integration system Edge detection analysis: Edges are places in the image with strong intensity contrast. Since edges often occur at image locations representing object boundaries, edge detection is extensively used in image segmentation when we want to divide the image into areas corresponding to different objects. Representing an image by its edges has the further advantage that the amount of data is reduced significantly while retaining most of the image information. PCBs are constructed from different materials and colors of components. Therefore, segmentation to multi-regions is necessary to separate these elements within the captured image in order to detect the existence of physical defects. In this work, we have implemented the threshold and boundary based segmentation approach using multi-threshold and Sobel edge detection methods. Threshold

approach using multi-thresholding of three gray-level regions is implemented using threshold values selected from grayscale value range of 0 to 255. Boundary based segmentation method using gradient operator is used frequently. Sobel edge-detection is one of the popular gradient operators because of its ability to detect edges accurately to create boundaries. The Sobel operator performs a 2-D spatial gradient measurement on an image and so emphasizes regions of high spatial gradient that correspond to edges. Typically it is used to find the approximate absolute gradient magnitude at each point in an input grayscale image. During this segmentation operation, the threshold values are chosen based on visual observation while factor of 0.3 is implemented in Sobel edge-detection to reduce the blobs or noise. Finding defected image: Defect localization operation is necessary to extract the difference between the reference image and the test image using image subtraction operation. This operation is applied directly on the thresholded reference image and edge detected test image. The image subtraction is performed between reference image and test image represented as image g and image f respectively. The differences of these images, referred to as image d, inherit the size of the input image. This operation is described as d(x, y) = _ 0 f(x, y) g(x, y) = 0 255 f(x, y) g(x, y) _= 0 where g(x,y) is the pixel intensity of image g, f (x,y) is the pixel intensity of image f and d(x,y) is the pixel intensity of image d. The pixel location is represented as x for x-axis and y for y-axis. Noise elimination operation:

Noise elimination is a main concern in computer vision and image processing because any noise in the image can provide misleading information and cause serious errors. Noise can appear in images from a variety of sources: during the acquisition process, due to camera’s quality and resolution, and also due to the acquisition conditions, such as the illumination level, calibration and positioning. In this case, the occurrence of noise is mainly contributed by invariance of pixel intensity during image repositioning due to rotation operation. Loss of information from the inspected image may happen and also contribute to false alarm in inspection process. To overcome this issue, noise elimination technique using window filtering has been used in this procedure. The window filtering technique is also capable of highlighting identification information of component. After window filtering stage, the median filter which is best known filter in non-linear category is used to eliminate remaining noise and preserve the spatial details contained within the image. The filter will replace the value of a pixel by the median of the gray levels in the neighborhood of that pixel.

NCVCCC’08

211

III. SIMULATION RESULTS Deterministic selection has the ability to reach the highest maximum fitness, followed by roulette-wheel and tournament for all the test images as shown in figure 4.

Fig 3: The schemes are compared in terms of maximum fitness

V. CONCLUSIONS We have performed satisfactorily in image registration of PCBs especially for high density PCB layout, which are placed arbitrarily on a conveyor belt during inspection. The registration process is crucial for the defect detection procedure that is based on pixel interpolation operations. Currently, the proposed system is capable of detecting missing components and cut solder joints in any shape and size with low computational time. Deterministic scheme outperformed tournament and roulette wheel schemes in term of maximum fitness, accuracy and Computational time. Consequently, it has been established as an ideal selection method in elitism for this work.

REFERENCES: [1] S. L. Bartlett, P. J. Besl, C. L. Cole, R. Jain, D. Mukherjee, and K. D. Skifstad. Automatic solder joint inspection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(1), 1988. [2] A. E. Brito, E. Whittenberger, and S. D. Cabrera. Segmentation strategies with multiple analyses for an SMD object recognition system. In IEEE Southwest Symposium on Image Analysis and Interpretation, 1998. [3] L. G. Brown. A survey of image registration techniques.ACM Computing Surveys, 24:325–376, 1992. [4] N.-H. Kim, J.-Y. Pyun, K.-S. Choi, B.-D. Choi, and S.-J. Ko, "Real-time inspection system for printed circuit boards," in IEEE International Symposium on Industrial Electronics, June 2001. [5] S. Mashohor, J. R. Evans, and T. Arslan, "Genetic algorithm based printed circuit board inspection system “,in IEEE International Symposium on Consumer Electronics, pp. 519-522, September 2004. [6] D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley Longman Inc, twentieth edition, 1999.

[7] Z.Michalewicz. Genetic Algorithms + Data structures=Evolution Programs. Springer-Verlag Berlin Heidelberg New York,second edition,1999.

NCVCCC’08

212

Abstract Harmonics are the unwanted components, and are dynamic in nature, that are generated by the usage of non linear loads. The increase in the number of non-linear loads has increased harmonic pollution in industries. Switched-mode power supplies, PWM inverters, voltage source inverters, fluorescent lighting with electronic ballasts, and computers are the major sources of harmonics current. Today’s automated facility relies heavily on electronic systems, which increase harmonics on the line side of the power distribution plant. While the fundamental current levels may be within specification, the harmonics can add the same amount of current as the fundamental. Unmitigated, these harmonics produce heat, unwanted trips, lockups and poor power factor. Preventive solutions for the harmonics are phase cancellation or harmonic control in power converters and developing procedures and methods to control, elimination of harmonics in power system equipment. Remedial solutions are use of filters and circuit detuning which involves reconfiguration of feeders or relocation of capacitor banks to overcome resonance. Passive solutions will correct one problem but add another. If the load condition varies, passive systems can actually cause resonances that can accelerate failure. But, active filters have overcome these problems associated with passive filters. This project uses adaline algorithm to identify and measure the current harmonics present in the power line. Since this measurement technique measures the harmonics in shorter time, it can be effectively used in active filters. This adaline algorithm is implemented on a FPGA platform Index Trems: Neural Network, harmonics

I. INTRODUCTION

The use of nonlinear loads like diode rectifiers, controlled rectifiers etc in industrial and domestic applications pollute the power system. The nonlinear load injects harmonic currents to the power lines resulting in poor utilization of the system and reduces its efficiency due to low power factor. To alleviate the problems of harmonics, passive harmonic filters and active harmonic filters are used. However, design and installation of passive elements in power systems or industries requires special attentions due to the possible resonances that may occur.

Active filters are an effective means for harmonic compensation of nonlinear power electronic loads, of particular importance as electric utilities enforce harmonic standards such as IEEE 519. Harmonic compensation is an extremely cost-sensitive application since the value-added to the user is not apparent. Today active filters are more easily available for loads greater than 10 kVA and are costly. For effective active filtering, measurement of harmonics is required. This project proposes a method in which the measurement of harmonics is performed on a FPGA incorporating an Adaptive Neural Network called Adaline. The shunt active filter systems are used to improve the power factor (Chongming Qiao et al 2001). The shunt active filter is a boost topology based current controlled voltage source converter. The shunt active filter (SAF) is connected in parallel to the source and the nonlinear load as shown in Figure 1.1 and SAF response is shown in Figure 1.2.The power factor is improved by compensating for harmonic currents. The control objective on the shunt active filter is defined as: Shift the phase angle of the input current with the phase angle of the fundamental component of the load current. This proposed control strategy produces a current reference using phase shifting method on the sensed input currents and then it is applied on the resistive emulator type input current shaping strategy. The phase shifting control technique has an advantage of compensating only for harmonic current, but this technique is capable of compensating for reactive current along with harmonic current as well (Hasan Komurcugil et al 2006).

Figure 1.1 SAF connected in parallel to the source and the nonlinear load

Implementation Of Neural Network Algorithm Using VLSI Design

B.Vasumathi1 Prof.K.R.Valluvan2

PG Scholar1 ,Head of the Department2

1Department of Electronics and Communication Engg, Kongu Engineering College, Perundurai-638052 2Department of Information Technology, Kongu Engineering College, Perundurai-638052

[email protected]

NCVCCC’08

213

Figure 1.2 SAF response

II. REDUCTION OF HARMONICS A.Preventive solutions

Phase cancellation or harmonic control in power converters Developing procedures and methods to control, reduce of eliminate harmonics in power system equipment

B. Remedial solutions Use of filters Circuit detuning which involves reconfiguration of feeders or relocation of capacitor banks to overcome resonance.

C.Harmonic filters Important general specification to consider when searching for harmonics filters include the type and signal type. Harmonic filter isolate harmonic current to protect electrical equipment from damage due to harmonic voltage distortion. They can also be used to improve power factor. Harmonic filters are generally careful applications to ensure their compatibility with the power system and all present and future non-linear loads. Harmonics filter tend to be relatively large and can be expensive. Harmonic filter type includes: 1. Passive Filter 2. Active Filter D.Active filters Active filters are those which consist of active components like thyristors, IGBTs, MOSFETs etc. Active filtering techniques have drawn great attention in recent years. Active filters are mainly for the purpose of to compensate transient and harmonic components of load current iL so that only fundamental components remain in the grid current. Active filters are available mainly for low voltage networks. The active filter uses power electronic switching to generate harmonic currents that cancel the harmonic currents from a nonlinear load (Victor M. Moreno, 2006). By sensing the nonlinear load harmonic voltages and /or currents, active filters use either,

1. Injected harmonics at 180 degrees out of phase with the load harmonics or

2. Injected/absorbed current bursts to hold the voltage wave form within an acceptable tolerance.

A shunt active filter consists of a controllable voltage source behind a reactance acting as a current source. The Voltage Source Inverter (VSI) based SAF is by far the most common type used today, due to its well-known topology and straightforward installation procedure. It consists of a de-link capacitor, power electronic switches and filter inductors between the VSI and the supply line. The operation of shunt active filters is based on injection of current harmonics in phase with the load current harmonics, thus eliminating the harmonic content of the line current (Jong-Gyu Hwang et al 2004). When using active filter, it is possible to choose the current harmonics to be filtered and the degree of attenuation. The size of the VSI can be limited by using selective filtering and removing only those current harmonics that exceed a certain level, e.g. the level set in

IEEE Std. 519-1992. Together with the active filtering, it is also possible to control power factor by injecting or absorbing reactive power from the load. Active harmonic compensation an active harmonic filter (conditioner) is a device using at least one static converter to meet the “harmonic compensation” function (Hasan Komurcugil 2006). This generic term thus actually covers a wide range of systems, distinguished by:

1. The number of converters used and their association mode,

2. Their type (voltage source, current source), 3. The global control modes (current or

voltage compensation), 4. Possible associations with passive

components (or even passive filters). The only common feature between these active systems is that they all generate currents or voltages which oppose the harmonics created by non-linear loads. The most instinctive application is the one shown in Figure 2.1 which is normally known as “shunt” (parallel) topology (Souvik Chattopadhyay et al 2004). It generates a harmonic current which cancels current harmonics on the power network side. when the current reference applied to this control is (for example) equal to the harmonic content of the current absorbed by a external non linear load, the rectifier cancels all harmonics at the point of common coupling: this is known as active harmonic filter as shown in Figure 2.2.

Figure 2.1 Shunt-type Active harmonic Filter

Figure 2.2 Operation of Active harmonic Filter

III. HARMONIC ESTIMATION IN A POWER SYSTEM

USING ADALINE ALGORITHM

An adaptive neural network approach is used for the estimation of harmonic components of a power system. The neural estimator is based on the use of an adaptive perceptron comprising a linear adaptive neuron called Adaline (Shatshat El R et al 2004). The learning parameters in the proposed algorithm are adjusted to force the error

Non-linear

Control

Power

Convert

Active Harmonic

Filter

Non-linear

Load (s)

Power Network

NCVCCC’08

214

∑

Weight updation algorithm

+

y(t

ry(c

Sin(t)

Sin2

Sin(

Sin(

Sin(

Sin(

between the actual and desired outputs to satisfy a stable difference error equation. The estimator tracks the Fourier coefficients of the signal data corrupted with noise and decaying DC components very accurately. Adaptive tracking of harmonic components of a power system can easily be done using this algorithm. Several numerical tests have been conducted for the adaptive estimation of harmonic components of power system signals mixed with noise and decaying DC components.Estimation of the harmonic components in a power system is a standard approach for the assessment of quality of the delivered power. There is a rapid increase in harmonic currents and voltages in the present AC systems due to the increased introduction of solid-state power switching devices. Transformer saturation in a power network produces an increased amount of current harmonics. Consequently, to provide the quality of the delivered power, it is imperative to know the harmonic parameters such as magnitude and phase. This is essential for designing filters for eliminating and reducing the effects of harmonics in a power system. Many algorithms are available to evaluate the harmonics of which the Fast Fourier Transform (FFT) developed by Cooley and Tukey is widely used. Other algorithms include, recursive DFT, spectral observer and Hartley transform for selecting the range of harmonics. The use of a more robust algorithm is described by (Narade Pecharanin et al 1994) which provides a fixed gain Kalman filter for estimating the magnitudes of sinusoids of known frequencies embedded in an unknown measurement noise, which can be a mixture of both stochastic and deterministic signals. In tracking harmonics for the large power system, where it is difficult to locate the magnitude of the unknown harmonic sources, a new algorithm based on learning principles is used by (Narade Pecharanin et al 1994). This method uses neural networks to make initial estimates of the harmonic source in the power system with nonlinear loads. To predict the voltage harmonics, the artificial neural network based on the back propagation learning technique is used. An analogue neural method of calculating harmonics uses the optimization technique to minimize error. This is an interesting application from the point of view of VLSI implementation.This new approach is to find the adaptive estimation of harmonics using a Fourier linear combiner. The linear combiner is realized using a linear adaptive neural network called Adaline. An Adaline has an input sequence, an output sequence and a desired response-signal sequence. It also has a set of adjustable parameters called the weight vector. The weight vector of the adaline generates the Fourier coefficients of the signal using a nonlinear weight adjustment algorithm based on a stable difference error equation.This approach is substantially different from the back propagation approach and allows one to better control the stability and speed of convergence by the appropriate choice of parameters of the error difference equation (S N Sivanandam, 2006). Several computer simulation tests are conducted to estimate the magnitude and phase angle of the harmonic components from power system signals embedded in noise very accurately. Further, the estimation technique is highly adaptive and is capable of tracking the variations of amplitude and phase angle of the harmonic components. The performance of this algorithm is showing its superiority and accuracy in the presence of noise. To obtain the solution for the on-line estimation of the harmonics, the use of an adaptive neural network comprising a linear adaptive neuron

called Adaline is used as shown in Figure 3.2.The performance of the proposed neural estimation algorithm is very dependent on the initial choice of weight vector w, and the learning parameters. An optimal choice of the weight vector can produce faster convergence to the true values of the signal parameters. This can be done by minimising the RMS error between the actual and estimated signals starting with an initial random weight vector.

Figure 3.1Block diagram of the adaline x(t) – input to adaline w – Weight value y(c) – estimated value err – error value y(t) – target value

Once the weight vector is optimised, this can be used for online tracking of the changes in the amplitude and phase of the fundamental and harmonic components in the presence of noise etc.

IV.MATHEMATICAL DESCRIPTION The general form of wave form is N y(t)=(A sin(t+ )+(t) (4.1) =1 where, A -Amplitude of Harmonics -Phase of Harmonics The discrete-time version of signal represented by (4.1) is N 2k Y(k)= sin(A —— + ) + (k) (4.2) =1 Ns The input to the Adaline is given by 2k 2k 4k 4k X(k)=[sin — cos —— sin —— cos —— Ns Ns Ns Ns 2N k 2N k ….. sin —— cos —— ]T (4.3) Ns Ns Where, ƒs Ns = — ƒo ƒs = Sampling frequency ƒo = Nominal power system frequency T = Transpose of a quantity The weight vector of Adaline is updated using Widrow-Hoff delta rule e(k) X(k) W(k+1)= W(k) + ————— (4.4)

NCVCCC’08

215

X T (k). X(k) Where, X (k) = input vector at time k Y^ (k) = estimated signal amplitude at time k Y (k) = actual signal amplitude at time k e (k) = y(k) –y^ (k ), error at time k = reduction factor Then signal Y(k) becomes Y(k) =Wo

T X(k) (4.5) Where, Wo = weight vector after final convergence The amplitude and phase of the Nth harmonic is given by AN = Wo

2(2N-1) +Wo2(2N) (4.6)

N = tan -1Wo (2N-1) / Wo (2N) (4.7)

V.RESULT AND DISCUSSIONS The code for adaline algorithm is developed and verified using MATLAB. And for the implementation its equivalent source is written using C. and it is modified for compatible with the Code Composer Studio (CCS). The output values in the memory can be seen by using the watch window as shown in Figure 5.1

Figure 5.1 Output values in the memory

In this experiment personal computer is used as the non-linear load. The supplied voltage and current drawn by personal computer waveform is shown in Figure 5.2. This waveform is captured using Power Quality Analyzer (PQA).

Figure 5.2 Waveform captured using Power Quality

Analyzer The harmonics coefficients measured Power Quality Analyzer and estimated harmonics coefficients using adaline algorithm is compared in Table 5.1. Using adaline algorithm we can measure the harmonics in a single cycle.

Table5.1 Comparison of harmonic orders Harmonics

Order Values

obtained from PQA

Output after 10 epochs

using Adaline algorithm

Error %

1 68.4 67.61 0.71

2 18.1 18.19 0.51

3 57.8 57.24 0.96

4 11.6 11.44 1.36

5 39.7 39.28 1.05

6 4.7 4.356 7.30

7 19.5 19.68 0.94

8 2.6 2.79 7.48

9 5.4 5.37 0.52

0 4.6 4.53 1.45

11 4.2 4.39 4.59

12 4.1 3.90 4.66

13 4.3 4.45 3.58

14 2.7 2.91 7.94

15 1.5 1.35 9.87

16 4.0 4.23 5.95

17 2.0 2.08 4.15

18 3.1 3.11 0.42

19 2.3 2.19 4.47

20 1.5 1.54 2.91

21 1.5 1.44 3.99

22 1.5 1.55 3.62

23 0.5 0.61 23.40

24 1.5 1.61 7.43

25 1.0 1.14 14.53

26 1.4 1.33 4.29

27 1.0 1.08 8.03

28 1.2 1.33 11.20

29 1.0 1.06 6.77

30 0 0 --

31 0 0 --

49 0 0 --

THD Val 113.2% 112.8% 0.35%

I

V

NCVCCC’08

216

VI. CONCLUSION

The Adaline algorithm has been studied and all the necessary parameters were formulated. The code generation techniques for Spartan-3E FPGA using VHDLis studied and results expected are shown above. Evaluation of code generation techniques for the Adaline algorithm using VHDL and will be implemented in Spartan-3E FPGA.

REFERENCES

[1]. Chongming Qiao, Keyue M. Smedley (2001) ‘A Comprehensive Analysis and Design of a Single Phase Active Power Filter with Unified Constant-frequency Integration Control’ IEEE Applied Power Electronics Conference, New York. [2].Dash P.K,.Swain D.P,.Liew A.C,Saifur Rahman,(1996),”An Adaptive Linear Combiner for On-Line Tracking of Power System Harmonics” IEEE Transactions on Power Electronics,Vol.11, No. 4 [3]. http://www.mathworks.com/products/tic2000/

[4]. Jong-Gyu Hwang, Yong-Jin Park, Gyu-Ha Choi (2004) ‘Indirect current control of active filter for harmonic elimination with novel observer-based noise reduction scheme’ Springer-Verlag, Journal of Electrical Engineering, Vol. 87, pp. 261-266. [5].Narade Pecharanin, Mototaka SONE, Hideo MITSUI, (1994) ‘An Application of Neural Network for Harmonic Detection in Active Filter’ IEEE transaction on Power Systems pp.3756-3760. [6]. Pichai Jintakosonwit, Hirofumi Akagi, Hideaki Fujita (2002) ‘Implementation and Performance of Automatic Gain Adjustment in a Shunt Active Filter for Harmonic Damping Throughout a Power Distribution System’ IEEE Transactions On Power Electronics, Vol. 17, No. [7]. Pichai Jintakosonwit, Hirofumi Akagi, Hideaki Fujita (2003) ‘Implementation and Performance of Cooperative Control of Shunt Active Filters for Harmonic Damping Throughout a Power Distribution System’ IEEE Transactions on Industry Applications, Vol. 39, NO [8]. Shatshat El R., M. Kazerani, M.M. A. Salama (2004) ‘On-Line Tracking and Mitigation of Power System Harmonics Using ADALINE-Based Active Power Filter System’, Proceedings of IEEE Power Electronics Specialists Conference, Japan, pp.2119-2124. [9]. Shouling Hc and Xuping Xu (2007) “Hardware simulation of an Adaptive Control Algorithm” proceedings of the 18th IASTED International conference, Modelling and Simulation, May30- June 1, 2007. [10]. Sivanandam S N, Sumathi S, Deepa S N, (2006), Introduction to Neural Networks Using Matlab 6.0’, First edition, Tata McGraw Hill Publishing Company Limited, New Delhi, pp 184-626. [11].Souvik Chattopadhyay and V. Ramanarayanan (2004) ‘Digital Implementation of a Line Current Shaping Algorithm for Three Phase High Power Factor Boost Rectifier Without Input Voltage Sensing’ IEEE Transactions On Power Electronics, Vol. 19, No.

NCVCCC’08

217

Abstract- Artificial Neural Networks (ANN) are inherently parallel architectures which can be implemented in software and hardware. One important implementation issue is the size of the neural network and its weight adaptation. This makes the hardware implementation complex and software learning slower. In practice Back propagation Neural Network is used for weight learning and evolutionary algorithm for network optimization. In this paper a modified genetic algorithm with more fondness to mutation is introduced to evolve NN weights and co-di1 encoding to evolve its structure. A single layered back propagation neural network is designed and trained using conventional method initially, then the proposed mutation based modified genetic algorithm is applied to evolve the weight matrix. This algorithm facilitates the hardware implementation of ANN.

Keyword: mutation, evolution

I. INTRODUCTION

Artificial Neural networks have recently emerged as a successful tool in the fields of classification, prediction etc. An ANN is an information processing paradigm that is inspired by the way biological nervous systems such as brain, processes the information. The function model of ANN consists of three sections that correspond to the simplified model of biological neuron shown in Fig 1. The three sections are weighted input connections, summation function and a non-linear threshold function that generates the unit output. In general each neuron receives an input vector

X=(X1, X2… Xn) modulated by a weighted vector W= (W1, W2… Wn). The total input is expressed as

NET =n

i=1

(X * W)∑ ………..(1)

The design of ANN has two distinct steps[1]; 1) Choosing a proper network architecture and 2) Adjusting the parameters of a network so as to minimize a certain fit criterion.

If the complexity of the problem is unknown the network architecture is set arbitrarily or by trial and error [3]. Too small networks cannot learn the problem well, but too large network size leads to over fitting and poor generalization performance. In general a large network also requires more computation than a smaller one.

FIG1. Functional model of an Artificial Neuron

II. NEED FOR EVOLUTIONARY ALGORITHMS

Most of the applications like pattern recognition, classification etc, use feed forward ANNs and Back propagation training algorithm. It is often difficult to predict optimal neural network size for a particular application. Therefore, algorithms that can find appropriate network architecture are highly important [1]. One such important algorithm is evolutionary algorithm. Evolutionary algorithms refer to a class of algorithms based on probabilistic adaptation inspired by the principles of natural evolution. They are broadly classified into three main forms - evolution strategies, genetic algorithms, and evolutionary programming. Unlike gradient-based training methods, viz. back propagation, GAs rely on probabilistic search technique. Though their search space is bigger, they can ensure that better solutions are being generated over generations [4]. The typical approach called ‘non-invasive’ technique uses Back propagation Neural Network for weight learning and evolutionary algorithm for network optimisation. Back

NET

A modified genetic algorithm for evolution of neural network in designing an evolutionary neuro-hardware

N.Mohankumar B.Bhuvan M.Nirmala Devi Dr.S.Arumugam M.Tech-Microelectronics & VLSI Design Lecturer School of Engineering Bannari Amman Institutions Department of ECE Department of ECE Amrita Vishwa Vidyapeetham Tamil Nadu NIT Calicut, Kerala NIT Calicut, Kerala Coimbatore [email protected] Tamil Nadu

X1

X2

X3

Xn

W1

W2

W3

Wn

Ativation Function

Output Y

NCVCCC’08

218

propagation is a method for calculating the gradient of error with respect to weights and requires differentiability. Therefore back propagation cannot handle discontinuous optimality criteria or discontinuous node transfer functions. When nearly global minima are well hidden among the local minima, back propagation can end up bouncing between local minima without much overall improvement. This leads to very slow training [3]. Back propagation neural network has some influence over evolutionary algorithm which causes local optimisation. So it is necessary to use a new method employing the evolutionary algorithm [4].

III.INVASIVE METHOD

The proposed technique named modified invasive technique, where weight adaptation and network evolution is carried out using GA. More importantly the GAs relying on the crossover operator does not perform very well in searching for optimal network topologies. So more preference is given to Mutation [2]. So a modified Genetic Algorithm (GA) for Neural Network with more fondness given to mutation technique named Mutation based Genetic Neural Network (MGNN) is proposed to evolve network structure and adapt its weights at the same time [5]. The applications of Genetic Algorithm in ANNs design and training are mostly concentrated in finding suitable network topologies and then training the network. GAs can quickly locate areas of high quality solutions when the search space is infinite, highly dimensional and multimodal. A. Evolution of Connection Weights Training a given network topology to recognize its purpose generally means to determine an optimal set of connection weights. This is formulated as the minimization of some network error function, over the training data set, by iteratively adjusting the weights [4]. The mean square error between the target and actual output averaged over all output nodes serve as a good estimate of the fitness of the network configuration corresponding to the current input. B. Evolution of Architecture A neural network’s performance depends on its structure. The representation and search operators used in GAs are two most important issues in the evolution of architectures. An ANN structure is not unique for a given problem, and there may exist different ways to define a structure corresponding to the problem. Hence, deciding on the size of the network is also an important issue [1] [4].

Fig 2. ANN architecture

. Too small a network will prohibit it from learning the desired input to output mapping; too big a one will fail to match inputs properly to previously seen ones and lose on the generalization ability. The structure is reduced by following ‘CoDi-1’ encoding.

IV.FITNESS COMPUTATION

Proper balance has to be mainted between ANN’s network complexity and generalization capability. Here the fitness function (Qfit) considers three important criterion[5]: Classification accuracy (Qacc); Training Error -percentage of normalized mean-squared error (Qnmse) and Network complexity (Qcomp). They are defined as follows

Qacc =100 * 1( )Correct

Total− ……………. (2)

Qnmse = 2

1 1

100* ( )

P N

j ii i

NPT O

= =

−∑∑ ……….. (3)

Qcomp = tot

CC

………………………. (4)

Qfit = * Qacc + * Qnmse + * Qcomp

(5) Where: N - Total number of input patterns P - Total number of training patterns, T - Target O- Network output, The value of Ctot is based on the size of its input (in), output (out), and the user-defined maximum number of hidden nodes (hid).

Ctot = in x hid + hid x out The user-defined constants , and are set to small values ranging between 0 and 1. They are used to control the strength of influence of their respective factors in the overall fitness measure. In the implemented ANN parity function favouring accuracy over training error and complexity for =1, = 0.70, and =0.30.

V. ALGORITHM

In the proposed algorithm initially, a population of chromosomes is created. Then, the chromosomes are evaluated by a defined fitness function. After that, any two chromosomes are selected for performing genetic operations based on their fitness. Then, genetic operations namely crossover and mutation are performed, (with more preference given to mutation). The produced offspring replace their parents in the initial population. This GA process repeats until a user-defined goal is reached. In this paper, the standard GA is modified and a different method [2][5] of generating offsprings are introduced to improve its performance. A. Initial Population First the weight matrix size is defined; it depends on the number of hidden nodes. Then a set of population is generated by assigning some random numbers.

P = p1, p2, p3,….., ppop-size Here pop-size denotes the population size. Each member in this set denotes a weight matrix having a particular order corresponding to the number of connections in the network from one layer to another. B. Evaluation

W1 W2

a

1

b

c

3

2

y

x

4

NCVCCC’08

219

Each weight matrix in the population will be evaluated by the defined fitness function Qfit. C. Selection The weight matrix having best fitness value is selected based on the modified GA approach [5]. D. Genetic Operations Genetic operations namely mutation and crossover are used to generate new offspring, finally the offspring with with maximum fitness is considered. 1) Crossover: Two weight matrices p1 and p2 are taken from P. The four new offsprings due to crossover mechanism is formed [5] based on the modified scheme. Pmax and Pmin are two matrices formed by the maximum and minimum range of the element in the population, w ε [0 1]. Then max (p1,p2) and min (p1,p2) denote the vectors with each element obtained by taking the maximum and minimum among the corresponding element of p1 and p2 respectively. 2) Mutation: The offspring OSc is taken and the mutation is performed by selecting an element with certain probability and its value is modified randomly based on the error at the output [2].From the above four offsprings the one with the largest fitness value is used as an offspring of crossover operation [5]. E. Stopping criterion Depending on the training performance and validation performance, generation of offsprings will stop only if the convergence rate is too low or the network output reaches the specified goal [1].

VI. EXPERIMENTS AND RESULTS

A 4-bit, 5-bit and 6-bit odd parity functions are used to examine the applicability and efficiency of the proposed algorithm. First a feed forward neural network is designed for parity function.First the structure is pruned using ‘Co-Di1’ encoding , then the weight matrices for each layer are evolved using invasive technique [1] [2]. When the number of hidden nodes is increased the network is giving better performance for the same weight matrix with relatively less iteration. The fitness of each matrix is evaluated.The rank is assigned based on the fitness value. The results are shown in Fig.3.The matrixes with best fitness values are given good rank. The sizes for the input and output units are problem-specific while the maximum number of hidden units is a user-defined parameter

3 a) 4-Bit parity

4 b) 5-Bit Parity

3 c) 6-Bit parity

TABLE I Simulation Results of the Proposed GA

NCVCCC’08

220

The effect of the GNNs performance against standard GA was measured using the following dependent variables:

• Percentage of wrong classification (Class Error); • Number of connection weights (Connections);

• Number of epochs

(generations). TABLE II Simulation results of the Proposed & standard GA

Fig. 4 shows the comparison between the standard and the proposed technique using the performance of a 4-Bit odd parity ANN.TABLE-II summarises the 4-bit, 5-bit and 6-bit parity function simulation results.

Fig 4. Performance comparison using 4-bit Parity

VII. CONCLUSION

By using the 4-bit, 5-bit and 6-bit odd parity neural network, it has been showed that the modified GA performs more efficiently than the standard GA The matrix with best fitness has least class error, achieve target in minimum epochs and independent of weight matrix size i.e. number of connections. The above training results also

illustrate that fitness based selection is better than the usual rank based selection, in terms of size, performance and time of computation.By pruning the neural network structure using this algorithm, hardware implementation becomes easy and efficient. The NN can be modelled in to a reconfigurable device and tested.

VIII. REFERENCES

[1]X.Yao,“Evolving artificial neural networks,” Proc. IEEE, vol. 87, no.9, pp. 1423–1447, Sep. 1999. [2]Paulito.P.Palmes, Taichi Hayasaka, “Mutation based Genetic Neural Network”, IEEE Trans on Neural Networks, Vol.16, no.3, pp587-600, 2005. [3]J. D. Schaffer, D. Whitley, and L. J. Eshelman, “Combinations of GA and neural networks: A survey” in Proc. Combinations of Genetic Algorithms and Neural Networks, pp1–37, 1992. [4] V. Maniezzo, “Genetic evolution of the topology and weight distribution of neural networks,” IEEE Trans. Neural Networks, vol. 5, pp. 39–53,1992. [5]F. Leung, H. Lam, S. Ling, and P. Tam, “Tuning the structure and parameter of a neural network using an improved genetic algorithm”,IEEE Trans. on Neural Network., vol. 14, no. 1, pp. 79–88, Jan. 2003.

4-bit 5-bit 6-bit

Ran

k

Fit

ness

No.

of

epoc

hs

No.

of

bit e

rror

s

Fit

ness

No.

of

epoc

hs

No.

of

bit e

rror

s

Fit

ness

No.

of

epoc

hs

No.

of

bit e

rror

s

1* 73 30 1 64 40 2 53 46 3 10* 25 58 4 31 70 5 20 86 10

12 4.2 ts 6 5.6 ts 6 3 ts 15

* values in plot(Fig. 3) ts-training stopped

Parameters Proposed Standard

No.

of

bit

s

Ran

k

Hid

den

node

s

Epo

chs

Fit

ness

Epo

chs

Fit

ness

1 5 30 72.54 36 63.26 4 10 5 58 25.3 60 16.51

1 6 38 66.25 25 51.1 5 10 6 66 32 65 28.1

1 7 46 54.81 37 43 6 10 7 80 23.41 76 11.46

NCVCCC’08

221

Abstract-Local Positioning signifies finding the positioning of fixed or moving object in a closed or indoor environment. This paper deals with FPGA implementation of Time-of-Arrival Estimator using the Distorted Template Architecture. The WLAN standard IEEE 802.11a frame format is used as the basis for building Localization system. This paper serves as a reference for any future work that is done in on localization scheme implementation since as of now no hardware model exists. Index Trems- FPGA, Time-of-Arrival Estimator, Distorted Template, Local Positioning, IEEE 802.11a.

I. INTRODUCTION

Since the advent of GPS, many new fascinating applications has been developed which has served technology, research and mankind to a very large extent. Local Positioning, contrary to GPS is used in the indoor environment, but serves the same purpose. A wide variety of technologies are available to deploy local positioning systems-like optical, ultrasound, radio frequency [1]. RF based local positioning is addressed in this paper. TOA based localization is chosen for two main reasons (a) since it provides inherent security, because it cannot be easily manipulated (b) it avoids large scale empirical measurements. The hardware implementation is very much essential for the practical realization of such systems, which makes VLSI implementation necessary. Verilog HDL has been used for RTL description of such systems and Matlab has been used to validate the same. The paper is organized as follows: section II gives the architecture chosen for Time-of-Arrival (TOA) Estimator. Section III explains the significance of the WLAN standard chosen, i.e, IEEE 802.11a, section IV deals with the FPGA implementation of TOA Estimator, section V shows the simulation results obtained using ModelSim, and section VI deals with the conclusion and further work.

II. DISTORTED TEMPLATE BASED TIME-OF-ARRIVAL ESTIMATOR

Conventionally, TOA estimation is performed by simple correlation technique. But this technique has been proved to be less accurate than distorted template based TOA estimator, in [2]. The block diagram of Distorted Template based TOA Estimator is as shown in Fig 1, and hardware implementation has also been carried out using the same architecture. Basically distorted template architecture

requires integrate and dump operation at the symbol rate [3]. Using this algorithm less number of training symbols are necessary for synchronization. It can be observed from Fig 1, that TOA Estimation consists of Channel Impulse Response Estimator: maximum likelihood channel estimation is chosen; Candidate IR; Convolutor and Cross-Correlator.

Fig 1: Block Diagram of Distorted Template based TOA

Estimator [2]

The final output is the cross-correlated values which gives the magnitude peak, whose position indicates the time offset.

III. IEEE 802.11a FRAME FORMAT

Utilization of the WLAN standards already available, aids in making the system compatible to present day applications in the market, thereby making the system cost effective. To serve this purpose IEEE 802.11a WLAN standard is used. This is the most popular and widely used standard for WLAN applications in the present day environment and this trend is likely to extend in the future also. The frame format of IEEE 802.11a is as shown in Fig 2. The highlighted part of Fig 2 shows the part of the frame format used for channel estimation, this part consists of Long Training Symbols (LTS). Although IEEE 802.11a frame format consists of 64 points, the implementation in presently carried out for 1st 12 LTS.

Fig 2: IEEE 802.11a OFDM frame format [4]

IV. FPGA IMPLEMENTATION

When hardware implementation is concerned, the internal architectures chosen for TOA Estimator has to consume less

Design and FPGA Implementation of Distorted Template Based Time-Of-Arrival Estimator for Local Positioning Application

Sanjana T S.1, Mr. Selva Kumar R.2, Mr. Cyril Prasanna Raj P.3 VLSI System Design Centre

M S Ramaiah School of Advanced Studies, Bangalore-560054, INDIA. [email protected], [email protected], [email protected]

NCVCCC’08

222

resource on FPGA, should operate at a higher speed and should consume less power. Each block shown in Fig 1 has been separately modeled and verified for its functionality and performance. a) Maximum Likelihood Channel Estimator: It has been observed through computation that Maximum Likelihood technique though computationally extensive, gives better accurate results than other methods like Least Square, Minimum Mean Square. On choosing appropriate algorithm this technique can be realized with less hardware. Maximum Likelihood Channel Estimator [3] is governed by the equation 1. It can be observed that the received signal has to be multiplied with the PseudoInverse to get the channel estimates.

The Block Diagram for channel impulse response estimator is shown in Fig 3. The PseudoInverse part of the equation 1.1 is stored in the ROM’s. The multiplication operation is split into two parts, and combined before shift and addition operation. Thus the matrix multiplication operation can be done using minimum number of multipliers.

Fig 3: Maximum Likelihood Channel Impulse Response

Estimator [5]

b) Candidate Impulse Response: Since the start of channel estimates is unknown this block is necessary. This block performs three major operations: circular shift, maximum selection and selection of two samples on either side of the maximum. Circular shift operation is performed 9 times. Since the values are in the complex number system format, to evaluate maximum value, it is necessary to perform real2+imaginary2 operation. Maximum selection is done and two signals from either sides of the maximum signal are chosen making the number of outputs from this block to be 5. Thus hypothesis of different candidate impulse response of the same length is done, such that each of them include the maximum estimated path. c) RAM: This block is used as a memory buffer, the main function of this block is to store the inputs fed into the channel estimation unit and to give it as input to correlator, when the correlator block is enabled. d) Convolution and Correlation [5]: The candidates obtained are convolved with the clean LTS, and the output thus obtained is called Distorted Template. The same LTS used for channel estimation is used to perform convolution. The output of convolutor is correlated with the received signal, to get the time offset estimation. If

the output of the candidate impulse response is represented as 1, 2, 3, 4, 5. Convolution and correlation is performed using 3 signals at a time, this is carried out using the signals 1,2, 3; 2,3,4 and 3,4,5 respectively.

V. SIMULATION RESULTS The model has been simulated using MATLAB, which is further used to validate the HDL results. RTL code has been written using Verilog HDL using Xilinx ISE 9.1i and simulation has been performed using ModelSim XE 6.1e. Fig 4, shows the output obtained from channel impulse response estimator. Fig 5, shows the output obtained from candidate impulse response. Fig 6 shows the output obtained from maximum selection after correlation. Fig 7, shows the output obtained from final Time-of-Arrival Estimator, thus 3 outputs along with their positions can be observed in Fig 7.

Fig 4: Simulation result of Channel Impulse Response

Estimator

Fig 5-Simulation result of Candidate Impulse Response

NCVCCC’08

223

Table 1: Synthesis Report Summary

The Minimum period obtained from synthesis report is 7.359ns (Maximum Frequency: 135.892MHz). The Minimum input arrival time before clock is 5.433ns and the Maximum output required time after clock is 2.509ns.

Fig 6: Simulation result of Maximum Selection after

correlation

Fig 7: Simulation result of Time-of-Arrival Estimator

The top level schematic of the final TOA estimator is as shown in Fig 8. This RTL schematic can be compared with

the Fig 1, to give a brief idea about the structure of verilog coding,

Fig 8: Top level RTL schematic

The design is synthesized compatible to Virtex 5 board, with target device xc5vlx-3ff1136, the summary of the synthesis report obtained is as shown in table 1.

VI. CONCLUSION

The architecture chosen for TOA estimation is based on Distorted Template. This architecture is chosen since it provides better accuracy than traditional correlation schemes. This method is also said to be generic as it is not only limited to IEEE 802.11a preamble training, but can further be changed to any WLAN standard by changing the contents of the ROMs in the channel estimation block. It is also to be noted that distorted template scheme does not give desired results if channel estimation is improper. The design implementation can further be extended to 64 point IEEE 802.11a format. ASIC implementation is the next step that can done, if an IC has to be developed.

REFERENCES

[1] Martin Stuart Wilcox, “Techniques for Predicting the Performance of Time-of-Flight Based Local Positioning Systems”, PhD thesis, University College London, Sept.2005.

[2] Harish Reddy, M Girish Chandra and P Balamuralidhar, “An Improved Time of Arrival Estimation for WLAN based Local Positioning”, 2nd International Conference on Communication Systems Software and Middleware, COMSWARE, pp. 1-5, Jan. 2007.

[3] Liuqing Yang and Giannakis G.B., “Timing ultra-wideband signals with dirty templates”, IEEE Transactions on Communications, Vol. 53, No. 11, pp. 1952 - 1963, Nov. 2005.

[4] Marc Engels, “Wireless OFDM Systems, How to make them work?”, Kluwer Academic Publishers, ISBN-1-4020-7116-7, 2002.

[5] M.J. Canet, I.J. Wassell, J. Valls, V. Almenar, “Performance Evaluation of Fine Time Synchronizers for WLANs”, 13th European Signal Processing Conference, EUSIPCO, Sep.2005

Used Available

Utilization

Number of Slice

Registers

22830

69120

33%

Number of Slice LUTs

17002 69120

24%

Number of fully used Bit

Slices

5360

34472

15%

Number of bonded IOBs

164

640

25%

Number of

Block RAM/FIFO

2 148 1%

Number of BUFG/BUFG

CTRLs

2 32 6%

NCVCCC’08

224

Abstract – This paper presents a practical design procedure for Micro strip Patch Antenna for low, medium and high dielectric constant substrate with single, double and four patches in series and parallel . The design process starts with the theoretical design of the antenna. Finally, the results of the implementation of the designs are presented using SONNET software and compared to get the best possible design. Key words: Micro strip patch antenna, substrate, radiation.

I INTRODUCTION

A micro strip patch antenna is a narrowband, wide- beam antenna fabricated by etching the antenna element pattern in metal trace bonded to an insulating substrate [1]. Because such Antennas have a very low profile, are mechanically rugged and can be conformable, they are often mounted on the exterior of aircrafts and spacecrafts, or are incorporated into mobile radio communications devices. Micro strip antennas have several advantages compared to conventional microwave antennas; therefore many applications cover the broad frequency range from 100 MHz to 100 GHz. Some of the principal advantages compared to conventional microwave antennas are:

• Light weight, low volume, end thin profile configurations, which can be made conformal.

• Low fabrication cost. • Linear, Circular and dual polarizations antenna can

be made easily. • Feed lines and matching networks can be fabricated

simultaneously with the antenna However micro strip antennas also have limitations compared to conventional microwave antennas:

• Narrow bandwidth and lower gain. • Most micro strip antennas radiate into half space. • Polarization purity is difficult to achieve • Lower power handling capability.

The general layout of a parallel coupled microstrip patch antenna is shown in Figure 1.

Fig. 1 Schematic diagram of microstrip patch antenna

There are many substrates with various dielectric constants that are used in wireless applications. Those with high dielectric constants are more suitable for lower frequency applications in order to help minimize the size. Alumina laminates are some of the most widely used materials in the implementation of microwave circuits. Alumina laminate is most widely used for frequencies up to 20GHz. The Alumina laminate has several advantages over the less expensive FR4 substrate [2]. While the FR4 becomes very unstable at high frequencies above 1 GHz, the Alumina laminate has very stable characteristics even beyond 10 GHz. Furthermore, the high dielectric constant of the ceramic-filled Alumina reduces the size of the micro strip circuit significantly compared [3] to one that is designed using FR4.

II RADIATION MECHANISM

Micro strip antennas are essentially suitably shaped discontinuities that are designed to radiate. The discontinuities represent abrupt changes in the micro strip line geometry [4]. Discontinuities alter the electric and magnetic field distributions. These results in energy storage and sometimes radiation at the discontinuity. As long as the physical dimensions and relative dielectric constant of the line remains constant, virtually no radiation occurs. However the discontinuity introduced by the rapid change in line width at the junction between the feed line and patch radiates. The other end of the patch where the metallization abruptly ends also radiates. When the field on a micro strip line encounters an abrupt change in width at the input to the patch, electric fields spread out. It creates fringing fields at this edge, as indicated.

III MICROSTRIP LINES A micro strip line consists of a single ground plane and a thin strip conductor on a low loss dielectric substrate above the ground plate. Due to the absence of the top ground plate and the dielectric substrate above the strip, the electric field lines remain partially in the air and partially in the lower dielectric substrate. This makes the mode of propagation not pure TEM but what is called quasi-TEM [5]. Due to the open structure and any presence in discontinuity, the micro strip line radiates electromagnetic energy. The use of thin and high dielectric materials reduces the radiation loss of the open structure where the fields are mostly confined inside the dielectric.

Design and simulation of Microstrip Patch Antenna for Various Substrate

*T.Jayanthy, **A.S.A.Nisha,*** Mohemed Ismail,****Beulah Jackson *Professor and HOD,Department of Applied Electronics,**Research student,***UG Student

Sathyabama university,Chennai 119 ****Asistantprofessor,DepartmentofECE,PanimalarEngineeringCollege

Email id : [email protected], [email protected]

NCVCCC’08

225

Losses in micro strip lines: Two types of losses exist:- (1) Dielectric loss in the substrate: Typical dielectric substrate material creates a very small power loss at microwave frequencies. The calculation of dielectric loss in a filled transmission line is easily carried out provided exact expressions for the wave mechanisms are available but for micro strip this involves extensive mathematical series and numerical methods. (2) Conductor loss: This is by far the most significant loss effect over a wide frequency range and is created by high current density in the edge regions of the thin conducting strip. Surface roughness and strip thickness also have some bearing on the loss mechanism The total attenuation constant can be expressed as = d + c , where d , c are the dielectric and ohmic constants. QUASI TEM MODE OF PROPOGATION The electromagnetic waves in free space propagate in the transverse electromagnetic mode (TEM). The electric and magnetic fields are mutually perpendicular and further in quadrature with the direction of i.e. along the transmission line Coaxial and parallel wire transmission line employ TEM mode of. In this mode the electromagnetic field lines are contained entirely within the dielectric between the lines. But the micro strip structure involves an abrupt dielectric interface between the substrate and the air above it. Any transmission line system which is filled with a uniform dielectric can support a single well defined mode of propagation at least over a specific range of frequencies (TEM for coaxial lines TE or TM for wave guides.) Transmission lines which do not have such a uniform dielectric filling cannot support a single mode of propagation. Micro strip falls in this category [9]. Here the bulk of energy is transmitted along the micro strip with a field distribution which quite closely resembles TEM and is usually referred to as Quasi – TEM. The micro strip design consists of finding the values of width (w) and length (l) corresponding to the characteristic impendence (Zo) defined at the design stage of the network. A substrate of permittivity(Er) and thickness (h) is chosen. The effective micro strip permittivity (Eeff) is unique to a fixed dielectric transmission line system and provides a useful link between various wave lengths impedances and velocities [6]. The micro strip in general, will have a finite strip thickness, ‘t’ which influences the field distribution for moderate power applications. The thickness of the conducting strip is quite significant when considering conductor losses. For micro strip with t /h ≤ 0.005, 2 ≤ Er ≤ 10 and w /h ≥ 0.1, the effects of the thickness are negligible. But at smaller values of w /h or greater values of t / h the significance increases.

IV DESIGN PROCESS OF ANTENNA Through all the design process, air gap has been used to build the antenna structures. The reason for choosing this is because by using certain dielectric substrates the efficiency

of the antenna will be reduced; secondly it is very easy to construct the antenna. Based on the antenna knowledge concentration has been put on the linearly polarized transmitted signal, because the bandwidth of the linearly polarized antenna is greater than the circularly polarized antenna. Linear polarization is preferred as compared to circular polarization because of the convenience of a single feed than a double feed. Moreover the construction of linearly polarized rectangular patch antenna [7], [8] is simpler than the other polarization configurations. DESIGN CALCULATION FORMULAE The operating frequency rf

Thickness of the dielectric medium,

rrfch

ε××Π××≤

23.0

Thickness of the grounded material alumina,

rrfch

ε××Π××≤

23.0

Width of metallic patch,

2

1

2

1

2

−

⎥⎦⎤

⎢⎣⎡ +

×⎟⎟⎠

⎞⎜⎜⎝

⎛×

= r

rfcW ε

Length of metallic patch, L

lf

cLreffr

∆−××

= 22 ε

Where, ( ) ( )( ) ( ) ⎥

⎥⎦

⎤

⎢⎢⎣

⎡

+×−

+×+××=∆

hWhW

hlreff

reff

8.258.

264.03.412.

εε

2

1

121

2

1

2

1−

⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎠⎞

⎜⎝⎛ ×

+×−

++

=W

hrrreff

εεε

V IMPLEMENTATION OF THE PROJECT

Fig 1 Single patch antenna

NCVCCC’08

226

Fig 2 Two rectangular patches in series

Fig 3 Two rectangular patches in parallel

Fig 4 Four rectangular patches in series

Fi g 5 Four rectangular patches in parallel

VI PERFORMANCE ANALYSIS

Fig 6 Single patch with low dielectric 2.2 for 6.5GHz

Fig 7 Single patch with Medium dielectric 6.0 for 9.5GHz

Fig 8 Single patch with High dielectric 12.9 for 6.5GHz

NCVCCC’08

227

Fig 9 Double patch with low dielectric 2.2 for 9.5GHz

Fig 10 Double patch with Medium dielectric 6.0 for 9.5GHz

Fig 11 Double patch with High dielectric 12.9 for 6.5GHz

Fig 12 Four patch with low dielectric 2.2 for 9.5GHz

Fig 13 Four patch with Medium dielectric 6.0 for 9.5GHz

Fig 14 Four patch with High dielectric 12.9 for 6.5GHz

VII RESULT

Thus the micro strip patch antenna was designed and simulated with various substrates for single, double and four patches in series and parallel to observe the difference in the performance and in turn the responses. Performance analysis shows when the dielectric constant increases magnitude of the response increases. Thus increasing the magnitude will correspondingly decreasing antenna size. Further increment of number of patches in series and parallel enhances the performance of antenna.

NCVCCC’08

228

VIII. CONCLUSION This paper has concentrated on an antenna design. A method for the rigorous calculation of the antenna has also been developed. The measured responses have good agreement with the theoretical predictions. The main Quality of the proposed antenna is that it allows an effective design maintaining all the advantages of micro strip antennas in terms of size, weight and ease of manufacturing. The compactness in the circuit size makes the design quite attractive for further developments and applications in modern radio systems especially in the field of Software Defined Radio receivers. It has been shown that the new class of antennas holds promise for wireless and mobile communications applications

REFERENCES

[1] E. Hammerstad, F. A. Bekkadal, Microstrip Handbook, ELAB Report, STF 44 A74169, University of Trondheim, Norway, 1975 [2] Dimitris T.Notis, Phaedra C. Liakou and Dimitris P. Chrissoulidis, Dual Polarized Microstrip Patch Antenna, reduced in size by the use of Peripheral slits, IEEE paper. [3] A. Derneryd, Linearly Polarized microstrip Antennas, IEEE Trans. Antenna and Propagat. AP-24, pp. 846-851, 1976. [4] M. Amman, Design of Microstrip Patch Antenna for the 2.4 Ghz Band, Applied Microwave and Wireless, pp. 24-34, November /December 1997 [5] K. L. Wong, Design of Nonplanar Microstrip Antennas and Transmission Lines, John Wiley & Sons, New York, 1999 [6] W. L. Stutzman , G. A. Thiele, Antenna Theory and Design , John Wiley & Sons,2nd Edition ,New York, 1998 [7] Bryant, T G and J A Weiss, Parameter of microstrip transmission lines and of coupled pairs of microstrip lines, IEEE Transactions on MTT – 16, No. 12, pp., 1021 – 1027, December 1968. [8] K. L. Wong, Compact and Broadband Microstrip Antennas, Wiley, New York, 2002 [9] G.S. Row , S. H. Yeh, and K. L. Wong, “ Compact Dual Polarized Microstrip antennas”, Microwave & Optical Technology Letters, 27(4), pp. 284-287, November 2000. [10] E. J. Denlinger, “Losses in microstrip lines,” IEEE Trans. Microwave Theory Tech., vol. MTT-28, pp.513-522, June 1980

229

Motion Estimation of The Vehicle Detection and Tracking System

Mr.A.Yogesh ,PG Scholar and Mrs.C. Kezi selva vijila,Assistant professor, Electronics and communication engineering, Karunya University

Abstract—In this paper we are dealing with increasing congestion on freeways and problems associated with existing detectors. Existing commercial image processing systems work well in free-flowing traffic, but the systems have difficulties with congestion, shadows and lighting transitions. These problems stem from vehicles partially occluding one another and the fact that vehicles appear differently under various lighting conditions. We are proposing a feature-based tracking system for detecting vehicles under these challenging conditions. This paper describes the issues associated with feature based tracking, presents the real-time implementation of a prototype system, and the performance of the system on a large data set. Index Terms -- Vehicle Tracking, Video Image Processing.

I.INTRODUCTION

In recent years, traffic congestion has become a significant problem. Early solutions attempted to lay more pavements to avoid congestion, but adding more lanes is becoming less and less feasible. Contemporary solutions emphasize better information and control to use the existing infrastructure more efficiently. The hunt for better traffic information, and thus, an increasing reliance on traffic surveillance, has resulted in a need for better vehicle detection such as wide-area detectors; while the high costs and safety risks associated with lane closures has directed the search towards noninvasive detectors mounted beyond the edge of pavement. One promising approach is vehicle tracking via video image processing, which can yield traditional traffic parameters such as flow and velocity, as well as new parameters such as lane changes and vehicle trajectories vehicles from images or videos. However, vehicle detection [1]–[10] is an important problem in many related applications, such as self-guided vehicles, driver assistance systems, intelligent parking systems, or measurement of traffic parameters, due to the variations of vehicle colors and sizes. One of most common approaches to vehicle detection is using vision-based techniques to analyze, orientations and shapes. Developing a robust and effective system of vision-based vehicle detection is very challenging. To address the above

problems, different approaches using different features and learning algorithms for locating vehicles have been investigated. Background subtraction [2-5] is used to extract motion features for detecting moving vehicles from video sequences. However, this kind of motion feature is no longer usable and found in still images. For dealing with static images, Wu et al. [6] used wavelet transform to extract texture features for locating possible vehicle candidates from roads. Then, each vehicle candidate is verified using a (PCA)principal component analysis classifier. In addition, Sun et al. [7] used Gabor filters to extract different textures and then verified each vehicle candidate using a (SVM) support vector machines classifier. In addition to textures, symmetry is another important feature used for vehicle detection. In [8], Broggi et al. described a detection system to search for areas with a high vertical symmetry as vehicle candidates. However, this cue is prone to false detections such as symmetrical doors or other objects. Furthermore, in [9], Bertozzi et al. used corner features to build four templates of vehicles for vehicle detection and verification. In [10], Tzomakas and Seelen found that the area shadow underneath a vehicle is a good cue to detect vehicles. In [11], Ratan et al. developed a scheme to detect vehicles’ wheels as features to find possible vehicle positions and then used a method called Diverse Density to verify each vehicle candidate. In addition, used stereo vision methods and 3-D vehicle models to detect vehicles and obstacles are used in [12-13]. The major drawback of the above methods to search vehicles is the need of a fully time-consuming search to scan all pixels of the whole image. For the color feature, although color is an important perceptual descriptor to describe objects, there were seldom color-based works addressed for vehicle detection since vehicles have very large variations in their colors. A color transform to project all road pixels on a color plane such that vehicles can be identified from road backgrounds is explained in [14]. Similarly, in [15], Guo et al. used several color balls to model road colors in color space and then vehicle pixels can be identified if they are classified no-road regions. However, since these color models are not compact and general in modeling vehicle colors, many false detections were produced and leaded to the degradation of accuracy of vehicle detection. In this paper we are proposing feature based tracking algorithm.

230

II.FEATURE BASED VEHICLE TRACKING STRATEGIES

An alternative approach of tracking objects as a whole sub-tracking features such as distinguishable points or lines on the object. The advantage of this approach is that even in the presence of partial occlusion, some of the features of the moving object remain visible. Furthermore, the same algorithm can be used for tracking in daylight, twilight or night-time conditions; it is self-regulating because it selects the most salient features under the given day and night conditions.

III. FEATURE BASED TRACKING ALGORITHM This section presents our vehicle tracking system, which includes: camera calibration, feature detection, feature tracking, and feature grouping modules. First, the camera calibration is conducted once, off-line, for a given location and then, the other modules are run continuously online in real-time.

• E.g., window corners, bumper edges, etc. during the day and tail lights at night.

• To avoid confusion, "trajectory" will be used when referring to entire vehicles and "track" will be used when referring to vehicle features.

A. Off-Line Camera Definition

Before running the tracking and grouping system, the user specifies camera-specific parameters off-line. These parameters include:

• Line correspondences for a projective mapping, or homography, as explained in figure1.

• A detection region near the image bottom and an exit region near the image top, and

• Multiple fiducially points for camera stabilization.

Since most road surfaces are flat, the grouper exploits an assumption that vehicle motion is parallel to the road plane. To describe the road plane, the user simply specifies four or more line or Point correspondences between the video image of the road (i.e., the image plane) and a separate 'world' road plane, as shown in Figure 1. In other words, the user must know the relative distance in world coordinates between four points visible in the image plane. Ideally, this step involves a field survey; however, it is possible to approximate the calculations using a video tape recorder, known lane widths and one or more vehicles traveling at a constant velocity. The vehicle velocity can be used to measure relative distance along the road at different times and the lane widths yield relative distance between two points on the edge of the road, coincident with the vehicle's position. Based on this off-line step, our system computes a projective transform, or homography, H, between the image coordinates (x,y) and world coordinates (X,Y),

This transformation is necessary for two reasons. First, features are tracked in world coordinates to exploit known physical constraints on vehicle motion .Second, the transformation is used to calculate distance based measures such as position, velocity and density. Once the homography has been computed, the user can specify the detection region, exit region and fiducially points in the image plane.

Fig.1 A projective transform, H, or homography is used to map from image coordinates, (x,y), to world coordinates, (X,Y).

B. On-Line Tracking and Grouping

A block diagram for our vehicle tracking and grouping system is shown in Figure 2. First, the raw camera video is stabilized by tracking manually chosen fiducially points to sub pixel accuracy and subtracting their motion from the entire image. Second, the stabilized video is sent to a detection module, which locates corner features in a detection zone at the bottom of the image. In our detection module, "corner" features are defined as regions in the gray level intensity image where brightness varies in more than one direction. This detection is operational zed by looking for points in the image, I , where the rank of the windowed second moment matrix, ∇I⋅∇IT, is two). It shows some example corners detected by the system. Next, these corner features are tracked over time in the tracking module. The tracking module uses Kalman filtering to predict a given corner's location and velocity in the next frame, (X,Y,X,Y), using world coordinates. Normalized correlation3 is used to search a small region of the image around the estimate for the corner location. If the corner is found, the state of the Kalman filter is updated; otherwise, the feature track is dropped. It shows the temporal progression of several corner features in the image plane. Vehicle corner features will eventually reach a user defined exit region that crosses the entire road near the top of the image (or multiple exit regions if there is an off ramp). Once corner features reach the exit region, they are grouped into vehicle hypotheses by the grouping module,

231

Fig.2 Block diagram of the vehicle tracking system. In the future, we plan to add a vehicle classification module, as indicated by the dashed lines. The grouper uses a common motion constraint to collect features into a vehicle: corner features that are seen as moving rigidly together probably belong to the same object. In other words, features from the same vehicle will follow similar tracks and two such features will be offset by the same spatial translation in every frame. Two features from different vehicles, on the other hand, will have distinctly different tracks and their spatial offset will change from frame to frame. A slight acceleration or lane drift is sufficient to differentiate features between most vehicles; note that both lateral and longitudinal motion are used to segment vehicles. Thus, in 3 For normalized correlation, a 9x9 template of each corner is extracted when the corner is first detected. In order to fool the grouper, two vehicles would have to have identical motions during the entire time they were being tracked. Typically, the tracking region is on the order of 100 m along the road. In congested traffic, vehicles are constantly changing their velocity to adjust to nearby traffic and remain in the field of view for a long period of time, giving the grouper the information it needs to perform the segmentation. In free flowing traffic, vehicles are more likely to maintain constant spatial headways, or spacing’s, over the short period of observation, making the common motion constraint less effective. Fortunately, under free flow conditions, drivers take larger spacing’s (in excess of 30 m), so a spatial proximity cue is added to aid the grouping/segmentation process. The grouper considers corner features in pairs. Initially points A and B that are less than a pre specified distance, apart will be hypothesized to belong to the same vehicle. By monitoring the distance between the points, this hypothesis can be dismissed as soon as the points are found to move relative to each other. The distance, d , is measured in the world coordinates by multiplying the image distance with a depth scaling factor computed from the (homography). Because features must share a common motion to be grouped into a vehicle, one feature track from each group is

selected as being representative of the vehicle trajectory. In particular, the grouper selects the feature point closest to the camera because it is likely to be near the ground plane and thus, is less likely to suffer from distortions due to the viewing angle. Finally, traffic parameters such as flow, average speed, and density are computed from the vehicle trajectories. C. Back ground Segmentation Background segmentation is used [8] to reduce false positives caused by textures in the static background (like windows and brick walls). Background segmentation using a background model is an effective way to take advantage of a static scene. Processing using a background model has the advantage of not being susceptible to textures that don't move, but have the disadvantage of not always working if the foreground object is similar in intensity to the back ground. The background model chosen for this system is a median background. The median is chosen because outliers do not affect it. Instead, if an outlier occurs, it is either the top or bottom value of the range found over the median frames. The procedure is as follows. A set of frames are collected, and the median value for each pixel is chosen, thus creating a background model. After a model is found, the current frame is subtracted with the background model. If the absolute value of the subtraction is greater than a threshold, it is marked as foreground.Background models themselves have inherent problems when attempting to detect wheels with the foreground found. Background subtraction will not find objects with similar intensities to the foreground (note black on the inside portion of the vehicles in Figure 3). Shadows are a continual and difficult problem (note the white section underneath the blobs in Figure 3). Also, it will detect any moving object, not just vehicles. Some way needs to be devised to find the wheels within the outlines marked by the background segmented. Combining background segmentation with the data dependant wheel detector takes advantage of the strength of both algorithms

Fig. 3. Example foregrounds from various sequences

232

RESULTS AND DISCUSSION

Fig.4. No moving objects and having moving objects in initialization background

Fig.5.The coarse location results by the projection of

difference image

Fig.6. The refine location results by the projection of edge map Tracking moving vehicle by the association graph match, we show the moving vehicles trajectories by centric of the region. Figure4,5,6 was the tracking results. From the experiment, this proposed method can be tracked a single vehicle and it also can be track multi-moving vehicle in video image sequence. Table 1 gives the rates of moving objects region detection, and the Table 2 shows the maxima and minimal processing time for the different algorithm part of our system.We define two metrics for characterizing the Detection Rate (DR) and the False Alarm Rate (FAR) [9][10] of the system. These rates, used to quantify the output of our system, are based on: TP (True positive): detected the regions that correspond to moving objects. FP (False positive): detected regions that do not correspond to a moving object. FN (False negative): moving objects not detected. These scalars are combined to define:

DR=TP/TP+FN FAR=FP/TP+FP TP FP FN DR FAR 1000 5 0 100% 0.5% Table .1. Quantitative analysis of the detection and tracking system Step Time(ms/frame) Average

frame(ms/frame) Background update

87.1~96.9 89.3

Location vehicle region

27.1~83.8 47.9

Vehicle region tracking

34.2~97.4 62.1

The whole process

157.0~285.0 163.7

Table.2. Evaluation of the different part of our system

results

IV. CONCLUSION

In this work, we have presented an approach for real-time region-based motion vehicle segmenting and detecting and tracking of moving objects using an adaptive background update and extraction, and using an association graph matching vehicle region on image sequences, with focus on a video-based intelligent transportation monitoring system. Recent evaluations of commercial VIPS found the existing systems have problems with congestion, occlusion, lighting transitions between night/day and day/night, camera vibration due to wind, and long shadows linking vehicles together. We have presented vehicle detection and tracking system that is designed to operate under these challenging conditions. Instead of tracking entire vehicles, vehicle features are tracked, which makes the system less sensitive to the problem of partial occlusion. The same algorithm is used for tracking in daylight, twilight and nighttime conditions, it is self-regulating by selecting the most salient features for the given conditions. Common motion over entire feature tracks is used to group features from individual vehicles and reduce the probability that long shadows will link vehicles together. Finally, camera motion during high wind is accounted for by tracking a small number of fiducially points. The resulting vehicle trajectories can be used to provide traditional traffic parameters as well as new metrics such as lane changes. The trajectories can be used as input to more sophisticated, automated surveillance applications, e.g., incident detection based on acceleration/deceleration and lane change maneuvers. The vehicle tracker is well suited both for permanent surveillance installations and for short term traffic studies such as examining vehicle movements in weaving sections. The vehicle tracking system can also extract vehicle signatures to match observations between detector stations and quantify conditions over extended links. The results show that the presented method improved both the computational efficiency and location accuracy. In the future, we will solve the problem of occlusion in the curve road and the problem of the shadow in the sunshine.

REFERENCES

[1] Z. Sun, G. Bebis, and R. Miller, “On-road vehicle detection: A review,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 5, pp. 694–711,May 2006. [2] V. Kastinaki et al., “A survey of video processing techniques for traffic applications,” Image, Vis., Comput., vol. 21, no. 4, pp. 359–381, Apr.2003. [3] R. Cucchiara, P. Mello, and M. Piccardi, “Image analysis and rulebased reasoning for a traffic monitoring,” IEEE Trans. Intell. Transport. Syst., vol. 3, no. 1, pp. 37–47, Mar. 2002.

233

[4] S. Gupte et al., “Detection and classification of vehicles,” IEEE Trans.Intell. Transport. Syst., vol. 1, no. 2, pp. 119–130, Jun. 2000. [5] G. L. Foresti, V. Murino, and C. Regazzoni, “Vehicle recognition and tracking from road image sequences,” IEEE Trans. Veh. Technol., vol.48, no. 1, pp. 301–318, Jan. 1999. [6] J. Wu, X. Zhang, and J. Zhou, “Vehicle detection in static road images with PCA and wavelet-based classifier,” in Proc. IEEE Intelligent Transportation Systems Conf., Oakland, CA, Aug. 25–29, 2001, pp. 740–744. [7] Z. Sun, G. Bebis, and R. Miller, “On-road vehicle detection using Gabor filters and support vector machines,” presented at the IEEE Int. Conf. Digital Signal Processing, Santorini, Greece, Jul. 2002. [8] A. Broggi, P. Cerri, and P. C. Antonello, “Multi-resolution vehicle detection using artificial vision,” in Proc. IEEE Intelligent Vehicles Symp., Jun. 2004, pp. 310–314. [9] M. Bertozzi, A. Broggi, and S. Castelluccio, “A real-time oriented system for vehicle detection,” J. Syst. Arch., pp. 317–325, 1997. [10] C. Tzomakas and W. Seelen, “Vehicle detection in traffic scenes using shadow,” Tech. Rep. 98-06 Inst. fur neuroinformatik, Ruhtuniversitat, Germany, 1998. [11] A. L. Ratan, W. E. L. Grimson, and W. M. Wells, “Object detection and localization by dynamic template warping,” Int. J. Comput. Vis., vol. 36, no. 2, pp. 131–148, 2000. [12] A. Bensrhair et al., “Stereo vision-based feature extraction for vehicle detection,” in Proc. IEEE Intelligent Vehicles Symp., Jun. 2002, vol. 2, pp. 465–470. [13] T. Aizawa et al., “Road surface estimation against vehicles’ existence for stereo-based vehicle detection,” in Proc. IEEE 5th Int. Conf. Intelligent Transportation Systems, Sep. 2002, pp. 43–48. [14] J. C. Rojas and J. D. Crisman, “Vehicle detection in color images,” in Proc. IEEE Conf. Intelligent Transportation System, Nov. 9–11, 1997,pp. 403–408. [15] D. Guo et al., “Color modeling by spherical influence field in sensing driving environment,” in

Proc. IEEE Intelligent Vehicles Symp., Oct.3–5, 2000, pp. 249–254.

234

Architecture for ICT (10,9,6,2,3,1) Processor

Mrs.D.S.Shylu,M.Tech., Miss.V.C.Tintumol Sr.Lecturer, 2nd ME(VLSI) Student, Karunya University ,Coimbatore KarunyaUniversity, Coimbatore [email protected] [email protected] Abstract—The Integer Cosine Transform (ICT) presents a performance close to Discrete Cosine Transform (DCT) with a reduced computational complexity. The ICT kernel is integer-based, so computation only requires adding and shifting operations. This paper presents a parallel-pipelined architecture of ICT(10 ,9 ,6 ,2 ,3 ,1) processor for image encoding. The main characteristics of ICT architecture are high throughput parallel processing, and high efficiency in all its computational elements. The arithmetic units are distributed and are made up of adders/ subtractors operating at half the frequency of the input data rate. In this transform, the truncation and rounding errors are only introduced at the final normalization stage. The normalization coefficient word length has been established using the requirements of IEEE standard 1180–1990 as a reference. Index Terms—Integer cosine transform, Discrete Cosine transform, image compression, parallel processing, VLSI

I.INTRODUCTION

THE Discrete Cosine Transform (DCT) is widely

considered to provide the best performance for transform coding and image compression . The discrete cosine transform (DCT) is widely considered to provide the best performance for transform coding and image compression. The DCT has become an international standard for sequential codecs such as JPEG, MPEG, H.261 etc . However, DCT matrix elements contain real numbers represented by a finite number of bits, which inevitably introduce truncation and rounding errors during compaction. Thus many applications that use this transform can be classified under the heading of “lossy” encoding schemes. This implies that the reconstructed image is always only an approximation of the original image. VLSI implementation of DCT using floating-point arithmetic is highly complex and requires multiplications.Different multiplication-free algorithms, which are approximations to the DCT, have been proposed in order to reduce implementation complexity.

In some cases, they can be used for lossless compression applications since the round-off error can be completely eliminated. In these algorithms, coefficients are scaled or approximated so that the floating-point multiplication can be implemented efficiently by binary shifts and additions. The Integer Cosine Transform (ICT) is generated applying the concept of dyadic symmetry and presents a similar performance and compatibility with the DCT. The ICT basis components are integers so they do not require floating-point multiplications, as these are substituted by fixed-point addition and shifting operations, as they have more efficient hardware implementation. This paper describes the architecture of 1-D ICT processor chip for image coding .In this architecture the arithmetic units are based on highly efficient adders/subtractors operating at half the frequency of the data input rate. The output coefficients can be selected with or without normalization. In the latter case, the normalization coefficient’s word length must be 18 bit, of which only 13 bits are necessary, if the specifications of the IEEE standard 1180–1990 are adhered to. The paper is organized as follows: A decomposition of the ICT to obtain a signal flow chart , which leads to an efficient hardware implementation, is presented in Section II. Generation and applying of order 8 ICT to real input sequence is explained in Section III and IV.In Section V, a pipeline structure is proposed of the 1-D eight-order transform, based on three processoring block operating in parallel with adders/subtractors combined with wired-shift operations as the only arithmetic elements.

II. DECOMPOSITION OF THE ICT

The ICT was derived from DCT by the concept of dyadic symmetry. Definition of dyadic symmetry is as follows:-

A vector of 2m elements [a0,a1,…………….a2m-1] is said to have the ith dyadic symmetry if and only if aj= s. a j ⊕ I where ⊕is an exclusive OR operation, j lies in the range [0,2m -1]and i lies in the range[1,2 m-1 ] and s=1 when the symmetry is even,and s=-1 when the symmetry is odd.

235

Let T be the matrix that represents the order-N DCT. The mnth element of this matrix is defined as

Tnm=(1/N)1/2 [km cos(m(n+1/2)Π/N)]

where

m,n = 0,1,………….N-1 (1)

1 if m≠0 or N

km = (2)

(1/2)1/2 if m=0 or N

III.GENERATION OF ORDER-8 ICTs

Steps to convert order-8 DCT kernel into order-8 ICT kernel is as follows

Step 1-Substitute value for N in the DCT transform matrix

Equation (1) shows the order-N DCT kernel. Substituting N=8 in equation(1) gives the order –8 DCT kernel which can be expressed as follows :-

T=KJ (3)

Where K is the normalization diagonal matrix and J an orthogonal matrix made up of the basis components of DCT.

By substituting N=8 in the above equation, we obtain an 8x8 matrix

[T]= [k0 j0,k1 j1 , k2 j2, k3 j3, k4 j4, k5 j5, k6 j6, k7 j7 ]t

where ki ji, the ith basis vector and ki is a scaling constant such that ki . ji=1

As T10 = - T17 = -T32 = T35 = -T51= T56 = -T73 = T74 ,we may represent the magnitudes of J10 , J17, J32, J35, J51, J56, J73, J74 by a single variable say ‘a’. Similarly all eight basis vectors are expressed as variables a, b, c, d, e and f which are constants and g is 1.Hence the orthogonal matrix J can be expressed in terms of variables a, b, c, d, e, f and g .

Step 2 -Find out the conditions under which Ji and Jj are orthogonal

The dyadic symmetry present in J reveals that to ensure their orthogonality, the constants a, b, c and d must satisfy the following only condition

ab =ac +bd +cd (4)

Step 3-Set up boundary conditions and generate new transforms

Equation(1) implies that for the DCT

a ≥ b ≥ c ≥ d and e ≥ f (5)

To make the basis vectors of the new transforms resemble those of the DCT, the inequality expression 4 have to be satisfied. Furthermore to eliminate truncation error due to no-exact representation of the basis components a, b, c, d, e and f expression (6) has to be satisfied i.e,

a, b, c, d, e and f are integers (6)

Those T matrices that satisfy (5),(6) and (7) are referred to as order-8 integer cosine transforms(ICTs),which is denoted as ICT(a, b, c, d, e, f).

IV. APPLYING 1-D ICT FOR A REAL INPUT SEQUENCE

The 1-D ICT for a real input sequence x(n) is defined as

X=Tx =KJx=KY (7)

Where X and x are dimension-8 column matrices, and K is the diagonal normalization matrix.

Reordering the input sequence and the transform coefficients according to the rules :-

x’(n)=x(n) n∈[0,3] (8)

x’(7-n)=x(n+4)

X’(m)=X(Br[m]) m∈[0,3] (9)

X’(m+4)=X(2m+1)

236

Where Br8[m] represents bit reverse operation of length 8, then 1-D ICT can be expressed as

X’= TR x’ = KR JR x’= KR Y’ (10)

The reordered basic components of ICT can be expressed as

J4e 0 I4 I4

JR= (11)

0 J4o I4 -I4

I4 being the dimension 4 identity matrix, and

g g g g

g -g -g g

J4e = (12)

e f -f -e

f -e e -f

a b c d

b -d -a -c

J4o = (13)

c -a d b

d -c b -a

Applying the decomposition rules defined in (8) and (9) to the J4e matrix results in

J2e 0 I2 I2

J4e = R4 (14)

0 J2o I2 -I2

Where R4 is the reordering matrix of length 4, I2 is the dimension-2 identity matrix, and

g g

J2e = (15)

g -g

e f

J2o = (16)

f -e

ICT(10,9,6,2,3,1) is obtained by substituting a=10, b=9, c=6, d=2, e=3, f=1 in the transform matrix. Hence J

matrix of ICT(10,9,6,2,3,1) can be expressed as follows

The signal flow graph of ICT(10,9,6,2,3,1) whose mathematical model described above is as shown below in fig.1

Fig.1.Signal flow graph of ICT(10,9,6,2,3,1)

As can be seen in fig1,the first computing stage operates on the input data ordered according to rule(8); additions and subtractions of data pairs formed with sequences x’(n) and x’(n+4) ,(n=0,1,2,3) are executed .In the second computing stage, the transformations J4e and J4o

are carried out, their nuclei being the matrices defined earlier. The transformations J4e is applied to first half of the intermediate data sequences (a0,a1,a2,a3) giving as a result the even coefficients (Y0,Y4,Y2,Y6) of the ICT. Similarly J4o is applied to the other half of the middle data sequence (a7,a6,a5,a4) giving as a result the odd coefficients (Y1,Y3,Y5,Y7) of the ICT. In the third computing stage,the coefficients Yi are normalized and the transform sequence of the coefficients X(m) appears reordered according to rule(9) .

237

V.ONE-DIMENSIONAL J(10,9,6,2,3,1)

ARCHITECTURE

The computations shown in the above signal flow graph (fig.1) can be realized using processing blocks i.e, individual processing block for each computing stage. The 1-D J(10,9,6,2,3,1) multiplication-free processor architecture is shown in Fig2. This architecture has been designed to implement the transformation JR according to the computing diagram of fig.1. The 1-D J processor consists of three processing block namely the input processing block for the processing of input sequence, the even processing block for the processing of half of the intermediate data sequence to produce an output that constitutes the even coefficients of the ICT i.e, computing the transformation J4e and the odd processing block for the processing of other half of intermediate data sequence to produce an output that constitutes the odd coefficients of the ICT i.e, computing the transformation J4o.These three processing blocks have parallel architecture, allowing the operation frequency to be reduced to fs/2 where fs is the input data sampling frequency. The final output mixer arranges, in natural form, the coefficient sequence of the ICT at a frequency of fs..The control of the processor is very simple and is carried out using four signals: Clk1, external clock at frequency fs; Clk2, internal clock at frequency fs/2; and the multiplexer selection signals S1at frequency fs/4 and S2 at frequency at frequency fs/8. The arithmetic multiplications have been reduced to add and shift operations. The adders/substractors in the processor are based on the binary carry look ahead adder.

Fig.2.Architecture of 1-D J(10,9,6,2,3,1)

A. INPUT PROCESSING BLOCK

Architecture of the input processing block is shown in fig.3 shown below. The input processing block performs the operation of the first computing stage with the input sequence data introduced in natural form at frequency fs. The input processing block consists of a shifter, two 4:1 multiplexer, two registers, an adder module denoted by AE1 and a subtractor module denoted by AE2. The input data is stored in a shift register SR1 from where

the two 4:1 multiplexers select the data to be processed by AE1 and AE2 in parallel at a sampling frequency fs/2. The input data sequence is entered in to the shift register at a sampling frequency fs. The output from the shift register is selected with the help of two 4:1 multiplexers .The output from the two multiplexers are then given to two registers REG1 and REG2. These two registers are driven by CLK2. The output from the registers are finally given to an adder and a subtractor module which performs the addition and subtraction of selected signals accordingly. Adder AE1 and subtractor AE2 are driven by a CLK2. The output of AE1 and AE2 provides the input for the even and odd processing block. Simulation results for the input processing block is as shown in fig.4.

Fig.3.Architecture of input processing block

Fig.4.simulation results for input processing block

B. J4e PROCESSING BLOCK

J4e processing block have been designed to calculate the even coefficients of the 1-D J transform. From the decomposition procedure established in (14), (15) and (16) applied to (12), we get the even coefficients of the J4e computation. From (7) it is clear that Y=Jx. Reordering of the input sequence gives Y’=JRx’. From (11), it can be found that JR is divided in to two computations namely J4e and J4o whose nuclei is as

238

shown in (17) and (18). Applying the decomposition rule to J4e matrix, we get,

Y0 1 1 0 0 1 0 1 0 a0

Y4 = 1 -1 0 0 0 1 0 1 a1 (17)

Y2 0 0 3 1 1 0 -1 0 a2

Y6 0 0 1 -3 0 1 0 -1 a3

The above matrix can be rewritten by introducing the intermediate data (b0,b1,b2,b3) as shown in (20)

Y0 1 1 0 0 b0

Y = 1 -1 0 0 b1 (18)

Y2 0 0 3 1 b3

Y6 0 0 1 -3 b4

Operating on (18), we get,

Y0 = 1 1 b0 and Y2 = 3 1 b3

Y4 1 -1 b1 Y6 1 -3 b4

Fig. 5 shows the signal flow graph obtained from (17) and (18). Fig. 7 shows the proposed architecture for J4e

computation. This architecture has four shift registers, four 4:1 multiplexers, three arithmetic units, and an output mixer to reorder the even coefficients (Y0,Y2,Y4,Y6). Fig.6 shows the timing diagram of J4e process specified in number of cycles of Clk2. For the sake of clarity, only the valid data contained in the shift registers and output data of AE3 and AE4 are shown in this diagram. The white cells show the valid data corresponding to the ith transform, and the white cells with a two-lined box indicate the shift register data processed by the arithmetic units. The light-gray cells contain data of previous or posterior transforms, and the empty dark-gray cells indicate non-valid data. The process begins storing the input data (a3,a2,a1,a0) in SRA1. AE3 and AE4 generate the intermediate data (b3,b2,b1,b0) where b0 and b1 are stored in register SRB1, whereas b2 and b3 are stored in SRB2. The 3X multiplier implemented by adding and shifting, generates the data 3b3 and 3b2, which are stored in register SRB3. After that, the even coefficients of the ICT are generated from the data stored in SRB1, SRB2, and SRB3, Y0 and Y2 in AE3 and Y4 and Y6 in AE4. The output mixer finally reorders the even coefficient

sequence of the ICT. Simulation results for J4e processor is shown in fig.8.

Fig.5.signal flow graph of J4e

Fig.6.Timing diagram of J4e

Fig.7.Architecture of J4e

Fig.8.Simulation results of J4e

C. J4oPROCESSING BLOCK

239

The processor has been designed to calculate the odd coefficients of the 1-D transform. The implementation of this processor can be simplified through the decomposition of the matrix. The odd coefficients of the 1-D transform can be implemented simply in terms of add and shift operations. Fig. 9 shows the signal flow graph. It has three computing stages with intermediate data d, e, f and g. Fig.10 illustrates their architecture made up of five shift registers, ten multiplexers 4:1 and five arithmetic units operating in parallel.

Fig.9.Signal flow graph of J4o

Fig.10.Architecture of J4o

VI.CONCLUSION

This paper presents an architecture of ICT processor for image encoding. The 2-D ICT architecture is made up of two 1-D ICT processors and a transpose buffer used as intermediate memory. The pipelined adders/substracters operates at half the frequency of the input data rate. Characteristics of this architecture are high throughput and parallel processing.

VII.REFRENCES [1] C. L. Wang and C. Y. Chen, ‘‘High-throughput VLSI architectures for the 1-D and 2-D discrete cosine transforms,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 1, pp. 31---40, Feb. 1995. [2] K. H. Cheng, C. S. Huang, and C. P. Lin, ‘‘The design and implementation of DCT/IDCT chip with novel architecture,’’ in Proc. IEEE Int. Symp. Circuits Syst., Geneva, Switzerland, May 28---31, 2000, pp.IV-741---IV-744. [3] A. Michell, G. A. Ruiz, J. Liang, and A. M. Burón, ‘‘Parallel pipelined architecture for 2-DICT VLSI implementation,’’ in Proc. IEEE Int. Conf. Image Process., Barcelona, Spain, Sep. 14---17, 2003, pp. III-89---III-92. [4] G.A.Ruiz, J.A.Michell and A.M.Buron, ‘‘Parallel-pipelined 8x8 forward 2-D ICT processor chip for image coding,’’IEEE Transc.signal processing,vol.53,no.2,Feb 2005 [5] P. C. Jain, W. Schlenk, and M. Riegel, ‘‘VLSI implementation of twodimensional DCT processor in real time for video codec,’’ IEEE Trans. Consum. Electron., vol. 38, no. 3, pp. 537---545, Aug. 1992. [6] L. G. Chen, J. Y. Jiu, H. C. Chang, Y. P. Lee, and C.W. Ku, ‘‘A lowpower 8_8 direct 2D-DCT chip design,’’ in J. VLSI Signal Process., vol. 26, 2000, pp. 319---332. [7] J. S. Chiang, Y. F. Chiu, and T. H. Chang, ‘‘A high throughput 2-dimensional DCT/IDCT architecture for real-time image and video system,’’ in Proc. 8th IEEE Int. Conf. Electron., Circuits, Syst., vol. 2, Piscataway, NJ, 2001, pp. 867---870. [8] Y. Zeng, L. Cheng, G. Bi, and A. C. Kot, ‘‘Approximation of DCT without multiplication in JPEG,’’ in Proc. 3rd IEEE Int. Conf. Electron., Circuits Syst., vol. 2, 1996, pp. 704---707. [9] J. Liang and T. D. Tran, ‘‘Fast multiplierless approximations of the DCT with the lifting scheme,’’ IEEE Trans. Signal Process., vol. 49, no. 12, pp. 3032---3044, Dec. 2001. [10] W. K. Cham, ‘‘Development of integer cosine transforms by the principle of dyadic symmetry,’’ in Proc. Inst. Elect. Eng. I, vol. 136, Aug. 1989, pp. 276---282. [11] W. K. Cham and Y. T. Chan, ‘‘An order-16 integer cosine transform,’’ IEEE Trans. Signal Process., vol. 39, no. 5, pp. 1205---1208, May 1991. [12] F. S. Wu and W. K. Cham, ‘‘A comparison of error behavior in the implementation of the DCT and the ICT,’’ in Proc. IEEE Region 10 Conf. Comput. Commun. Syst., Sep. 1990, pp. 450---453. [13] W. K. Cham, C. S. O. Choy, and W. K. Lam, ‘‘A 2-D integer cosine transform chip set and its applications,’’ IEEE Trans. Consum. Electron.,vol. 38, no. 2, pp. 43---47, May 1992. [14] T. C. J. Pang, C. S. O. Choy, C. F. Chan, andW. K. Cham, ‘‘A self-timed ICT chip for image coding,’’ IEEE Trans. Circuits Syst. Video Technol.,vol. 9, no. 6, pp. 856---860, Sep. 1999. [15] G. A. Ruiz, J. A. Michell, A. M. Burón, J. M. Solana, M. A. Manzano, and F. J. Díaz, ‘‘Integer cosine transform chip design for image compression,’’in Proc SPIE First Int. Symp. Microtechnologies New Millenium:VLSI Circuits Syst., vol. 5117, Maspalomas, Gran Canaria, Spain, May 2003, pp. 33---41.

240

Row Column Decomposition Algorithm for 2D Discrete Cosine Transform

Caroline Priya.M and Mrs.D.S.Shylu, Lecturer, Karunya University

.

Abstract-This paper presents an architecture for 2-D-Discrete Cosine Transform (DCT) based on the fast row/column decomposition algorithm and also a new schedule for 2-D-DCT computation. The transposed memory can be simplified using shift-registers for the data transposition between two 1-D-DCT units. A special shift register cell is designed with MOS circuit. The shift operation is based on capacitor energy transferring methodology.

I.INTRODUCTION

Video coding systems have widely used the discrete cosine transform (DCT) to remove redundancy data. Many fast DCT algorithms were presented to reduce the computational complexity, and VLSI architectures were designed for a dedicated DCT processor. A row/column decomposition approach is popular due to its regularity and simplification, but it needs a transposed memory for 2-D-DCT processing. We use either the flip-flop cell or the embedded RAM for data transposition. If the flip-flop is used to perform the data transposition, the chip complexity becomes high since one flip-flop cell requires more transistors in a typical CMOS cell library. As using the embedded RAM, we have to employ the memory compiler to generate the expected RAM size. Although the layout density is high, the VLSI implementation becomes more complex.

With a partcular access scheduling for 2-D-DCT, the simple shift-register array can be used for data transposition.

II. THE 2-D DCT ALGORITHM

For a given 2-D spatial data sequence Xij;i,j=0,1,…N-1, the 2-D DCT data sequence Ypq; p,q= 0, I , . . . , N - 1 is defined by

The forward and inverse transforms are merely mappings from the spatial domain to the transform domain and vice versa. The DCT is a separable transform and as such, the row-column decomposition can be used to evaluate (1).

Denoting

by qh and neglecting the scale factor (2/N)EpEq, the column transform can be expressed as

and the row transform can be expressed as

241

N-1

Zpj= Xij cpi, p,j=0,1,2,….N-1 (3)

i=0

In order to compute an N x N-point DCT (where N is even), N row transforms and N column transforms need to be performed. However, by exploiting the symmetries of the cosine function, the number of multiplications can be reduced from N*N to N*N/2. In this case, each row transform given by (3) can be written as matrix-vector multipliers via

N/2-1

Zpj = [Xij+(-1)p X(N-1-i)j]cpi (4)

i=0

Using a matrix notation for N= 8

(4) can be written as ZOO Equations (5) and (6) describe the computation of the even and odd coefficients, for the row transform for N=8, respectively. The computation for the second 1-D DCT i.e. the column transform described by (2) can also be computed using matrix-vector multipliers similar to that described by (4). Hence both the row and column transform can be performed using the same architecture.The architecture for computing the row transform, for N=8, is depicted in Fig. I It is based on step I of the systolic array implementation proposed by Chang and Wang. It consists of N/2 adder/subtractor cells for summing and subtracting the inputs to the I-D DCT block as required by (4). The pair of inputs X, and enters the (i+ I)th adder/subtractor cell. In the proposed

architecture, all the pairs of input data enter the adder/ subtractor cells at the same time. Fig. I shows that the architecture also consists of N VIPs, where half are used for the added pairs as described by (5) and the other half for the subtracted pairs as described by (6). Each VIP consists of NI multiplier/accumulator cells. Each cell stores one coefficient Cpi in a register and evaluates one specific term over the summation in (4). The multiplications of the terms Cpi with the corresponding data are performed simultaneously and then the resulting products are added together in parallel.

III. PROPOSED 2-D DCT ARCHITECTURE

Based on the fast row/column algorithm, one can utilize a time-sharing method to perform 2-D-DCT with one 1-D-DCT core for cost-effective design.The 2-D-DCT architecture consists of the 1-D-DCT core and the shift-register array. For the Nth block processing, the first row pixels f00–f07are sequentially loaded to R0–R7 during 0–7th cycles as in Fig 3. R0–R7 are selected to the computation kernel by multiplex 2_1 for the first row coefficient transformation during 8–15th cycles. The resulting coefficient is sequentially sent to the shift register per cycle

Fig1.a. Architecture of 1D-DCT block for N=8, Fig1.b. Basic cell.

242

Meanwhile, the second row pixels f10–f17 are loaded to R8–R15. In the 16–23rd cycles, R8–R15 are selected to the computation kernel for the second row coefficients computing. At the same time, the third row pixels are loaded into R0–R7. Repeat this schedule; one block pixel can be transformed to 1-D coefficient with row-by-row during 64 cycles.A pair of registers R0–R7 and R8–R15 is chosen by multiplexes controlled with Clk_Enable signal 0 and 1, respectively. The addition or subtraction of two pixels is pre proceeded for even or odd coefficients computing, which can be implemented using two’s complement control with XOR gate. The weights 1–4 are cosine coefficients. The cosine coefficients can be easily implemented using a finite state machine. The computational order is regular from coefficients F0, F1……F7. Repeat this schedule; one block pixel can be transformed to 1-D coefficient with row-by-

row during 64 cycles.A pair of registers R0–R7 and R8–R15 is chosen by multiplexers controlled by Clk_Enable signal 0 and 1, respectively. The same computation schedule is again employed for the new block transformation. The timing schedule and VLSI architecture for DCT computations are illustrated

Fig.2.Timing Schedule for DCTcomputation

IV. SHIFT REGISTER CELL AND CONTROL TIMING

The accessing schedule of the shift register is shown in Fig. 4 at the 71st cycle. The shift-register array is designed with a serial-in/parallel-output structure.The first 1-D-DCT results, m[00], m[10], . . ., m[70],are loaded to R0–R7 in parallel for2D DCT computation at the 71st cycle. Due to one-stage pipelined delay and

output latch, the first 2-D-DCT coefficient F[00] is achieved at the 74th cycle. Then, the 2-D-DCT coefficients F[10], F[20], . . . sequentially output during 75–81st cycles. For the next column processing, we send one clock to the shift-register array.Now the output of shift-register array becomes m[01], m[11], . . .,m[71]. The 1-D-DCT coefficients are loaded to R8–R15 in parallel at the 79th cycle. One can attain the second column 2-D-DCT coefficients during the 82–89th cycle, Repeating this computation schedule, the last column 1-

243

D-DCT coefficients m[07], m[17], . . ., m[77] are loaded to R8–R15 at the 116th cycle, and the 2-D coefficient F[70]–F[77] is sequentially achieved.For the next block processing, the new pixels are sequentially written into R0–R7 from the 117th to 125th cycles. For cost-effective design, a special shift register cell is designed with MOS circuit to reduce the memory size, as in Fig 5. The shift operation is based on capacitor energy transferring methodology.

We use two-phase to control the nMOS switch. At the first half cycle, _1 is high and _2 is low, so Q1 on and Q2 off. The D1 data is stored at c1 capacitor through Q1, where input data Din = .. D4, D3, D2, D1. At the next half cycle, the _1 and _2 status is inverse from the previous half cycle. The Q1 turns off and the Q2 turns on, the c1 data shifts to c2 capacitor. The inverter is used to keep the logic level at the end of shift cell. To the second cycle, Q1 turns on, D2 data is loaded to c1 in first half cycle. Meanwhile, the capacitor c2 still keeps

Fig.4. Shift-register cell and its control timing

D1 data since the Q2 turns off. The D1 data in c2 capacitor is through the inverter and transfers to the c3 since the Q3 turns on. In the next half cycle, the _2 clock becomes high;

Fig3.Proposed 2D DCT architecture with one 1D DCT core.

D2 and D1 are shifted c2 and c4 capacitors, respectively. Repeatedly, the shift function can be performed with the energy transferring technique. We can adjust the ratio of channel width and length of Q1, Q2 and inverter to decide the c1 and c2 capacitances.

244

Fig.5.Serial-in/parallel-out shift register array

Fig.6.Serial-in/parallel-out shift register output.

The capacitor c1 is dominated by Q1 source capacitance and Q2 drain capacitance. To satisfy c1_c2, one can increase the c1 capacitance with large ratio of

width and length for Q1, and the uniform ration for Q2 and inverter to minimize the memory size.

Fig 7.Loading of pixels in register R0..R7

Fig 8. Waveform for Shift register cell.

The shift-register cell can be implemented with two nMOS and one inverter circuit, where one bit cell only uses four transistors. The circuit complexity for transpose memory is much less than that of the conventional SRAM or flip-flop. Moreover, we do not need the extra controller, such as READ/WRITE access control and address decoder. The shift register is

245

modeled as a function block for full-system simulations. First, the preprocessing and computational core is realized with Fig.3. Then, the 2-D-DCT core is integrated with one 1-D-DCT core and the shift-register array and verified with logic simulations.

V. CONCLUSION

The 2-D-DCT processor is realized with a particular schedule consisting of 1-D-DCT core and the shift-register array. The shift-registers array can perform data transposition with serial-in/parallel-out structure based on capacitor energy transferring technique. The shift-register based transposition can reduce the control-overhead since the address generator and decoder for memory access can be removed. Comparison with the transposition-based DCT chips, the memory size and the full 2-D-DCT complexity can be reduced. This paper presents a cost effective DCT architecture for video coding applications.

REFERENCES

[1]Aggoun and I. Jalloh, “Two-dimensional DCT/IDCT architecture,” Proc. IEE Comput. Digit. Tech., vol. 150, no. 1, pp. 2–10, 2003. [2]D. Gong, Y. He, and Z. Cao, “New cost-effective VLSI implementation of a 2-D discrete cosine transform and its inverse,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 4, pp. 405–415, Apr. 2004. [3]E. Feig and S.Winograd, “Fast algorithm for the discrete cosine transform,” IEEE Trans. Signal Process., vol. 40, no. 9, pp. 2174–2193, Sep. 1992. [4]N. I. Cho and S. U. Lee, “Fast algorithm and implementation of 2-D discrete cosine transform,” IEEE Trans. Circuits Syst., vol. 38, no. 3, pp. 297–305, Mar. 1991. [5] “MPEG-2 video coding,” ISO/IEC DIS 13818-2, 1995. [6]G. Cote, B. Erol, and F. Kossentini, “H.263+: Video coding at low bit rate,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 7, pp. 849–866, Nov. 1998.

246

VLSI Architecture for Progressive Image Encoder

E.Resmi, PG Scholar, Karunya University, Coimbatore K.Rahimunnisa, Sr.Lecturer, Karunya University, Coimbatore

Abstract This paper presents VLSI architecture for progressive image coding based on a new algorithm called Tag setting in hierarchical tree. This algorithm is based on Set-Partitioning In Hierarchical Trees (SPIHT). The new algorithm has an advantage of requiring less memory as compared to SPIHT. VHDL code for the encoder core is developed. Index Terms:- Image compression; VLSI; Progressive coding

I. INTRODUCTION Progressive image transmission (PIT) is an elegant method for making effective use of communication bandwidth. Unlike conventional sequential transmission, an approximate image is transmitted first, which is then progressively improved over a number of transmission passes. PIT allows the user to quickly recognize an image and is essential for databases with large images and image transmission over low-bandwidth connections. Newer coding techniques, such as JPEG2000 [1] and MPEG4 [2] standards, have supported the progressive transmission feature. PIT via wavelet-coding using the Embedded Zerotree Wavelet (EZW) algorithm was first presented by Shapiro [3] in 1993. The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkable effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code. Using an embedded coding algorithm, an encoder can terminate the encoding at any point thereby allowing a target rate or target distortion metric to be met exactly. Also, given a bit stream, the decoder can cease decoding at any point in the bit stream and still produce exactly the same image that would have been encoded at the bit rate corresponding to the truncated stream. In addition to producing a fully embedded bit stream, EZW consistently produces compression results that are competitive with virtually all known compression algorithms. Said and Pearlman presented a faster and more efficient codec in 1996 [4] called Set-Partitioning in Hierarchical Trees (SPIHT) underlying the principles of EZW method. The SPIHT algorithm is a generalization of the EZW algorithm. It uses a partitioning of the trees called

spatial orientation trees in manner that tends to keep insignificant co-efficients together in large subsets. SPIHT based algorithms are not best suited for hardware implementation due to their memory requirement. This paper presents a new algorithm for progressive image transmission based on Tag Setting In Hierarchical Tree which keeps low bit-rate quality as SPIHT algorithm and has three improved features. To reduce the amount of memory usage, tag flags are introduced to store the significant information instead of the coordinate-lists in SPIHT. The flags are four two-dimensional binary tag arrays including Tag of Significant Pixels (TSP), Tag of Insignificant Pixels (TIP) and Tag of Significant Trees (TST) respectively. When comparing with SPIHT coding, the algorithm only needs 26 K bytes memory to store four tag-arrays for a 256×256 gray-scale image. Both sorting-pass and refinement-pass of SPIHT coding are merged in one coding in order to simplify hardware-control and save unnecessary memory. It uses the Depth-First-Search (DFS) traversal order to encode bit-stream rather than the Breadth-First-Search (BFS) method as the SPIHT coding. For the hierarchical pyramid nature of the spatial orientation tree, DFS provides a better architecture than BFS method. The VLSI image compressor called PIE (Progressive Image Encoder) core has been synthesized using VHDL coding. The PIE is designed to handle 256×256 gray-scale images. The remainder sections of this paper are organized as follows. Section 2 is the background of progressive image transmission,. Section 3 addresses the proposed algorithm for progressive image encoding. Section 4 presents the VLSI architecture of the proposed PIE core. Finally, the conclusion is given in Section 5.

II. PROGRESSIVE IMAGE TRANSMISSION Progressive image transmission requires application of multi-resolution decomposition on the target image. The multi-resolution decomposition provides multi-resolution representation of an image. Let pi,j be a two-dimensional image, where i and j are the indices of pixel coordinates. The multi-resolution decomposition of image pi,j is written as c = (p). (1) Where (p) is a transformation of multi-resolution decomposition. Two-dimensional coefficient array c has the same dimensions as image p, and each element ci,j is

247

the transformation coefficient of p at coordinate (i,j). In a progressive image transmission, receiver updates received reconstruction coefficient cr according to the coded message until approximate or exact amount coefficients have been received. Then, the decoder can obtain a reconstructed image by applying inverse transformation pr = -1(cr). (2) Where pr is the reconstructed image, and c r are progressively received coefficients. Image distortion of reconstructed image pr from original image p can be measured by using Mean Squared Error (MSE), that is DMSE (p- pr)= DMSE (c- cr) (3) = (c(i,j)- cr(i,j))

2 MN (4) Where MN is the total number of all image pixels. In a progressive image transmission process, the transmitter rearranges the details within the image in the decreasing order of the importance. From Equation (3), it is clear that if an exact value of the transform coefficient cr(i,j) is sent to the decoder, then the MSE decreases by | ci,j |

2/ MN [4].This means that the coefficients with larger magnitude should be transmitted first because they have a larger content information. III. NEW ALGORITHM FOR PROGRESSIVE IMAGE

ENCODING The new algorithm is based on SPIHT algorithm. The essential of SPIHT coding algorithm is to identify which coefficients are significant, sort selected coefficients in each sorting pass, and transmit the ordered refinement bits. A function Sn(T) is used to indicate the significance of a set of coordinates T .i. e, Sn(T)= 1 when max ci,j 2n

0 otherwise In our opinion, the above encoder has three essential advantages as following.(1) Less memory required (2) Improved refinement pass (3) Efficient depth-first-search (DFS). Let TSP, TIP and TST be the two-dimensional binary arrays, whose entries are either ’0’ or ’1’. The overall coding algorithm includes six steps as follows. (1) Initialization: output n = log ( max| ci,j | ); set each value of all entries in TSP, TIP and TST arrays to ’0’. (2) Refinement output: (a) for each entry (i,j) in the TSP do: (i) if TSP=1 then output the n-th most significant bit of | ci,j |; (3) TIP testing: (a) for each entry (i,j) in the TIP do: (i) if TIP=1 and Sn(ci,j) = 1 then (A) output ‘1’ and output sign of ci,j ; (B) set value TIP := 0 and TSP := 1;

(ii) Otherwise, if TIP=1 and Sn (ci,j) = 0 then output ‘0’; (4) TST update: (a) for each entry (k,l)O(i,j) do: (i) if TST=0 and Sn(ci,j) = 1 then set value TST:=1; (5) Spatial orientation tree encoding: (a) for each entry (i,j) using DFS method do: (i) if TSP=0 and TIP=0 then (A) If Sn(i,j) = 1 then output ‘1’, sign of ci,j and the value of TST; set value TSP:=1; (B) otherwise, if Sn (i,j) = 0 then output ‘0’ and the value of TST; set value TIP:=1; (6) Quantization-step update: decrease n by 1 and go to Step 2. In Step 1, the algorithm calculates initial threshold and sets the values of three tag flags TSP, TIP and TST to ’0’ initially. In Step 2, the entry marked with TSP=1, which is evaluated in the last Step 5, is significant. The entry, TIP=1,tested as insignificant in last Step 5 may be significant in Step 3 due to the different threshold. Thus, the algorithm performs TIP testing to update TIP value in Step 3. In Step 4, it updates TST value of each coefficient except the leave nodes and prepares to perform tree encoding in next Step. If a node is TST=0, its descendants are all insignificant; in the otherwords, the tree leading by that node, TST=0, is a zerotree. The algorithm searches those nodes, TST=0, using depth-first search (DFS) method and outputs an ’0’ in Step 5 to keep low bit rate as SPIHT coding does. At last, it decreases quantization step n by 1 and go to Step 2 iteratively. Proposed algorithm is the same as what the SPIHT coding does but using different data structures. For instance, in the refinement output and TIP testing steps, the algorithm uses tag flags TSP and TIP to indicate whether a node is significant or not. Then, output and encode the image stream by investigating the TSP and TIP tags. On the other hand, SPIHT coding uses coordinate sets LSP and LIP to store coordinate information of nodes. When comparing both methods, the information stored in TSP (LSP) is the same as in TIP (LIP). Besides, in the spatial orientation tree encoding step of TSIHT coding, if a node is TST=1, it trends to searching its descendants using DFS method without any output. However, in the sorting pass of the coding, each node in LIS list with type A may change to type B and apply encoding again. Thus, in general case, the proposed algorithm has lower bit rate quality than SPIHT does.

IV. VLSI ARCHITECTURE Progressive Image Encoder reads the wavelet coefficients from external memory using a 16-bit input signal, and it reads the tag flags of TSP, TIP and TST from external tag memory using 8-bit input signals. For

248

reading coefficients or tags from memory, encoder first generates the address of the data and then it reads the data using input signals. PE outputs the encoded bit-stream using signal bit_out when sync asserts. Figure 1 shows the overall architecture of the encoder. It has six blocks in addition to the external coefficient and tag memory.

Fig.1. Progressive Image Encoder hardware architecture Clock Divider generates three clock signals with different frequencies to synchronize internal circuit. Threshold Generator calculates the initial value n and updates its value at every iteration. Tag Access Unit controls the access of three tags, TSP, TIP and TST. Address Generator generates the location addresses of the coefficient and the tag memory. Bit-Stream Generator outputs the encoded bit-stream of encoder. Controller is the master of all blocks. We will discuss each block in the following sections. 1) Clock Divider (CD): TAU needs one clock cycle for reading operation and two clock cycles for writing. Besides, AG also needs at most three clock cycles to output encoded bit stream including value ’1’,sign of coefficient and TST value when it finds a coefficient is significant. Thus, encoder needs three different clock frequencies in hardware circuit. In this work, Clock Divider generates three clocks using divide-by-2 and divide-by-8 circuits.. (2) Threshold Generator (TG): TG is used to generate initial threshold, n = log2( max|ci,j| ), from all coefficients and to generate the value n at every iteration in TSIHT coding. The hardware architecture of TG is illustrated in Figure 2.TG first reads all the coefficients and performs or operation bit-by-bit to find the maximum coefficient and store it in the buffer. After finding the maximum coefficient, Leading Zero Detector is used to find the position of most significant bit (MSB) to obtain the initial value n. Then, count-down counter continually decreases n by 1 and outputs the value to other circuits at every iteration.

Fig. 2. Threshold Generator

3) Tag Access Unit (TAU): To store three two-dimensional tag arrays, two 256×256 bits and one 128×128 bits RAM blocks are needed and controlled by Tag Access Unit. In this work, each tag memory is 8 bits wide; however, each tag flag is a one-bit data. To access each bit from 8 bits wide memory using 16-bit address signal, Addr[15:0], TAU uses a similar architecture shown in Figure 3.When TAU reads one bit from tag memory, it first generates a 13-bits address signal, Addr[15:3], to read one byte data, then it uses the lowest 3-bits address signal Addr[2:0], to indicate that one-bit tag. When TAU writes one bit of tag memory, it first reads the mentioned bytes as reading operation, then it replaces that one-bit tag to tag memory. Thus, TAU needs one clock cycle to read each bit and two clock cycles to write it.

Fig. 3. TSP memory access in Tag Access Unit 4) Address Generator (AG): In order to access the coefficient and tag from external memory, Address Generator (AG) provides a mapping from the (row,col) coordinate to the linear address of memory. On the other words, AG is used to generate a 16-bit address signal, while the signal Addr[15:8] is the row address, and the signal Addr[7:0] is the column address, such that, each address pair to the coordinate of the coefficient or the tag can be located from memory. To adapt different data structures of external memory content, AG behaves as a mapping function from current address to the next address depends on five selection cases from F1 to F5 as

249

following.

Fig. 4 Address Generator

F1: Wavelet coefficient address generation When encoder performs TSIHT coding in TST update step, AG is used to generate the wavelet coefficient address with bottom-up direction.. While updating TST, AG first searches the most peripheral starting at the start mark toward the inner nodes of every scanning line. Let c_col and c_row be the current column and row addresses; n_col and n_row be the next column and row addresses. Assuming tmp_size is the coordinate boundary in each level. The flowchart of the F1 address generation is illustrated in Figure 4. Note that, as showing in Figure 4, next address is obtained from current address depends on different boundary conditions. F2: Ancestor address generation In TST update step, for each entry (k,l)O(i,j), if it finds that a descendant coefficient, (k,l), with TST=0 is significant, the TST value of the parent, (i,j), assigned to TST=1. To locate the ancestor address from its descendant coefficient, bitwise-shifting operation on descendant coordinate is used. For instance, Figure 5 illustrates the ancestor-descendant relations labeled with row and column address. The ancestor address can be obtained by right-shifting one bit on each of its descendant coordinate. F3: Descendant address generation In spatial orientation tree encoding step, algorithm uses DFS method to traverse all the nodes of the spatial orientation tree. It first searches the root node and each one of its branching to its immediate descendants until to the leaves. As similar to F2, the descendant address may be obtained by left-shifting one bit with adding certain necessary values.

Fig. 5 The flowchart of F1 address generator F4: General linear counter Within the first three steps of the algorithm, AG behaves a general two-dimensional counter. When AG works in mode F4, the address of scanning line is generated row-by-row and column-by column sequentially

Fig. 6 Ancestor-descendant relations of node coordinate F5: Neighbor address generation The addresses of the four neighbor nodes originated from the same ancestor have the same property that their row or column addresses are identical except the last bit. And, their address pairs (row,col) of the last bit are variety with following sequence (0,0) (0,1) (1,0) (1,1). When AG works in mode F5, the neighbor addresses of each node can be generated by using such principle. Since, each iteration of TSIHT coding algorithm ends at F5, after encoder finishes working at mode F5, an iteration flag signal It_flag is produced to notify other control units.When encoder performs TSIHT coding algorithm, AG generates the addresses of the coefficients with coordinate pair (row,col) using one of above function units to access the coefficient or tag memory. Only one function unit is allowed to read input data and execute its task each time. At the front of each function unit, a latch is added to reduce the power consumption as showing in Figure 3. Besides, before entering one function from others, it may also need to clear previous state. All these function units are controlled by AG_controller. Let C1 and C 2 be the clear states, and s0, s1,…, s11 be the control

250

state set of AG controller. The finite state machine of AG_ controller is shown in fig 7

Fig. 7 .Finite state machine of AG_controller

(5) Bit-stream Generator (BG): In encoder, Bit-stream Generator (BG), as shown in Figure 8, generates the encoded bit stream bit-by-bit. The primary component, Significance Test Unit, of BG is used to check whether a

coefficient is significant is significant or not. According to the TSIHT algorithm, BG outputs values depend on threshold, TST signal, magnitude and sign of coefficient. The output signals of BG include the bit_out bit stream and synchronous sync signals. Note that, only when sync asserts, the bit stream appearing at bit_out signal is meaningful.

Fig 8. Bit stream generator.

V. CONCLUSION

The new algorithm significantly reduces the memory usage by using tag flags. The problem of large memory usage with SPIHT can thus be corrected while maintaining the efficiency of SPIHT. A prototype of a 256×256 gray-scale image PIE core for progressive image transmission has been designed The VHDL design of the same is done and simulated using Modelsim and synthesized using Xilinx. The figure8 and figure9 shows simulation results of threshold generator module and address generator controller

Fig. 8 Threshold generator output waveform Fig.9 AG_controller output waveform

251

REFERENCES

[1] C. Christopoulos, A. Skodras, and T. Ebrahimi, “The JPEG2000 still image coding system: An overview,” IEEE Transactions on Consumer Electronics, vol. 46, pp. 1103–1127, Nov. 2000. [2] T. Sikora, “The MPEG-4 video standard verification model,” IEEETransactions on Circuits and Systems for Video Technology, vol. 7,no. 1, pp. 19–31, Feb. 1997. [3] J. M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Transactions on Signal Processing, vol. 41, pp.3445–3462, Dec. 1993.. [4] A. Said and W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243–250, June 1996 [5] D. Mukherjee and S. K. Mitra, “Vector spiht for embedded wavelet video and image coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 3, pp. 231–246, Mar. 2003. [6] Z. Wang and A. C. Bovik, “Embedded foveation image coding,” IEEETransactions on Image Processing, vol. 10, no. 10, pp. 1397–1410, Oct.2001. [7] T. Kim, S. Choi, R. E. V. Dyck, and N. K. Bose, “Classified zerotree wavelet image coding and adaptive packetization for low-bit-rate transport,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 9, pp. 1022–1034, Sept. 2001. [8] W. A. Pearlman, A. Islam, N. Nagaraj, and A. Said, “Efficient, low complexity image coding with a set-partitioning embedded block coder, ”IEEE Transactions on Circuits and Systems for Video Technology,vol. 14, no. 11, pp. 1219–1228, Nov. 2004. [9] A. Munteanu, J. Cornelis, G. V. der Auwera, and P. Cristea, “Wavelet image compression - the quadtree coding approach,” IEEE Transactions on Information Technology in Biomedicine, vol. 3, no. 3, pp. 176–185, Sept. 1999.[11] S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 674–693, July 1989. [10] S.-F. Hsiao, Y.-C. Tai, and K.-H. Chang, “Vlsi design of an efficient embedded zerotree wavelet coder with function of digital watermarking,” IEEE Transactions on Consumer Electronics, vol. 46, no. 7, pp. 628–636,Aug. 2000. [11] B. Vanhoof, M. Peon, G. Lafruit, J. Bormans, M. Engels, and I. Bolsens, “A scalable architecture for mpeg-4 embedded zero tree coding,” Custom Integrated Circuit Conference, pp. 65–68, 1999.

[12] Z. Liu and L. J. Karam, “An efficient embedded zerotree wavelet image codec based onintraband partitioning,” IEEE International Conference on Image Processing, vol. 3, pp. 162–165, Sept. 2000.

252

Reed Solomon Encoder and Decoder using Concurrent Error Detection Schemes

Rani Deepika.B.J, 2nd Year ME (VLSI Design), Karunya University, Coimbatore.

Email: [email protected] Rahimunnisa.K, Sr. Lecturer, Karunya University, Coimbatore.

Email: [email protected]

Abstract—Reed–Solomon (RS) codes are widely used to identify and correct errors in transmission and storage systems. When RS codes are used for high reliable systems, the designer should also take into account the occurrence of faults in the encoder and decoder subsystems. In this paper, self-checking RS encoder and decoder architectures are presented. The presented architecture exploits some properties of the arithmetic operations on GF(2m) Galois Field, related to the parity of the binary representation of the elements of the field. In the RS decoder, allows implementing Concurrent Error Detection (CED) schemes useful for a wide range of different decoding algorithms with no intervention on the decoder architecture.

Index Terms: Concurrent Error Detection, Error Correction Coding, Galois Field, Reed Solomon Codes.

I. INTRODUCTION

Reed-Solomon codes are block-based Error Correcting Codes with a wide range of applications in digital communications and storage. Reed Solomon codes are used to correct errors in many systems including: Storage devices, Wireless or mobile communications, Satellite communications, Digital television, High-Speed modems. A. Error Correction Codes

High reliable data transmission and storage systems frequently use Error Correction Codes (ECC) to protect data. By adding a certain grade of redundancy these codes are able to detect and correct errors in the coded information. Error-control coding techniques detect and possibly correct errors that occur when messages are transmitted in a digital communication system. To accomplish this, the encoder transmits not only the information symbols but also extra redundant symbols. The decoder interprets what it receives, using the redundant symbols to detect and possibly correct whatever errors occurred during

transmission. You might use error-control coding if your transmission channel is very noisy or if your data is very sensitive to noise.

B. Block Coding

Depending on the nature of the data or noise, you might choose a specific type of error-control coding. Block coding is a special case of error-control coding. Block-coding techniques map a fixed number of message symbols to a fixed number of code symbols. A block coder treats each block of data independently and it is memoryless. The Reed Solomon codes are based on this concept. Reed-Solomon codes are block codes. This means that a fixed block of input data is processed into a fixed block of output data.

The Reed-Solomon encoder takes a block of digital data and adds extra "redundant" bits. Errors occur during transmission or storage for a number of reasons. The Reed-Solomon decoder processes each block and attempts to correct errors and recover the original data. The number and type of errors that can be corrected depends on the characteristics of the Reed-Solomon code. The typical system is shown in Fig.1

Fig.1.. Typical System

In the design of high reliable electronics systems both the Reed-Solomon (RS) encoder and decoder should be self checking in order to avoid faults in these blocks which compromise the reliability of the whole system. In fact, a fault in the encoder can produce a noncorrect codeword, while a fault in the decoder can give a wrong data word even if no errors occur in the codeword transmission. Therefore, great attention must be paid to detect and recover faults in the encoding and decoding circuitry.

253

C. Properties of Reed Solomon Codes

Nowadays, the most used Error Correcting Codes are the RS codes, based on the properties of the finite field arithmetic. In particular, finite fields with 2m elements are suitable for digital implementations due to the isomorphism between the addition, performed modulo 2, and the XOR operation between the bits representing the elements of the field. The use of the XOR operation in addition and multiplication allows to use parity check-based strategies to check the presence of faults in the RS encoder, while the implicit redundancy in the codeword is used either for correct erroneous data and for detect faults inside the decoder block.

II. REED-SOLOMON CODES

Reed-Solomon codes provide very powerful error correction capabilities, have high channel efficiency and are very versatile. They are a “block code” coding technique requiring the addition of redundant parity symbols to the data to enable error correction. The data is partitioned into blocks and each block is processed as a single unit by both the encoder and decoder. The number of parity check symbols per block is determined by the amount of error correction required. These additional check symbols must contain enough information to locate the position and determine the value of the erroneous information symbols.

A. Finite Field Arithmetic

The finite fields used in digital implementations are in the form GF(2m), where m represents the number of bits of a symbol to be coded. An element a(x) GF(2m) is a polynomial with coefficients ai 0,1 and can be seen as a symbol of m bit a = am-1….. a1a0. The addition of two elements a(x) and b(x) GF(2m) is the sum modulo 2 of the coefficients ai and bi, i.e., is the bitwise XOR of the two symbols a and b. The multiplication of two elements a(x) and b(x) GF(2m) requires the multiplication of the two polynomials followed by the reduction modulo i(x), where i(x) is an irreducible polynomial of degree m. Multiplication can be implemented as an AND-XOR network.

The RS(n,k) code is defined by representing the data word symbols as elements of the field GF(2m) and the overall data word is treated as a polynomial d(x) of degree k - 1 with coefficient in

GF(2m). The RS codeword is then generated by using the generator polynomial g(x). All valid codewords are exactly divisible by g(x).

The general form g(x) is

g(x) = (x + i) (x + i+1) … (x + i+2t)

where 2t=n-k and is primitive element of the field i.e.,

GF(2m) -0 ∃ i N|i = .

The codewords of a separable RS(n, k) code correspond to the polynomial c(x) with degree n -1 that can be generated by using the following formulas:

c(x) = d(x) . x(n-k) + p(x)

p(x) = d(x) . x(n-k) mod g(x)

where p(x) is a polynomial with degree less than n - k representing the parity symbols. In practice, the encoder takes k data symbols and adds 2t parity symbols obtaining a n symbol codeword. The 2t parity symbols allows the correction of up to t symbols containing errors in a codeword.

Defining the Hamming distance of two polynomials a(x) and b(x) of degree n as the number of coefficients of the same degree that are different, i.e., H(a(x), b(x)) = #i n|ai bi, and the Hamming weight W(a(x)) as the number of non-zero coefficients of a(x), i.e., W(a(x)) = #i n|ai 0 it is easy to prove that H(a(x),b(x)) = W(a(x) - b(x)). In a RS(n, k) code the Hamming distance between two codewords is n - k. After the transmission of the coded data on a noisy channel the decoder receives as

input a polynomial , where e(x) is the error polynomial. The RS decoder identifies the position and magnitude of up to t errors and it is able to correct them. In otherwords the decoder is able to identify the e(x) polynomial if the Hamming weight W(e(x)) is not greater than t. The decoding algorithm provides as output the codeword that is the only codeword having an Hamming distance not greater

than t from the received polynomial .

B. Proposed Implementations

254

In this section, the motivations of the design methodology used for the proposed design are described.

A radiation-tolerant RS encoder hardened against space radiation effects through circuit and layout techniques and also the single and multiple parity bits schemes are presented to check the correctness of addition and multiplication in polynomial basis representation of finite fields. Then extend the techniques presented to detect faults occurring in the RS encoder, achieving the selfchecking property for the RS encoder implementation. Moreover, a method to obtain CED circuits for finite field multipliers and inverters has been proposed.

Since both the RS encoder and decoder are based on GF(2m) addition, multiplication, and inversion, their self-checking design can be obtained by using CED design of these basic arithmetic operations. Moreover, a self-checking algorithm for solving the key equation has been introduced. Exploiting the algorithm proposed and substituting the elementary operations with the corresponding CED implementation for the other parts of the decoding algorithm a self-checking decoder can been implemented. This approach can be used for the encoder, that use only addition and constant multiplication and is illustrated in the following subsection, but it is unusable for the decoder as described later in this paper and a specific technique will be explained in the successive section.

III. REED-SOLOMON ENCODER

The Reed-Solomon Encoder is used in many Forward

Error Correction (FEC) applications and in systems where data are transmitted and subject to errors before reception, for example, communications systems, disk drives, and so on.

A. characteristics of the Reed-solomon Encoder

In order to design a self-checking RS encoder by using the multipliers, each fault inside these blocks should be correctly detected. This detection is not ensured for the entire set of stuck-at faults because no details on the logical net-list implementing the multipliers are given previously. In fact, an estimation of the probability of undetected faults different from zero. To overcome this limitation, obtaining a total fault coverage for the single stuck-at faults the solution proposed in is used. First of all, the characteristics of the arithmetic operations in

GF(2m) used in the RS encoder are analyzed with respect to the parity of the binary representation of the operands. The following two operations are considered:

• Parity of the addition in GF(2m);

• Parity of the constant multiplication in GF(2m).

Defining the parity P(a(x)) of a symbol as the XOR of the coefficients ai, and taking into account that in GF(2^m) the addition operation is realized by the XOR of the bits having the same index, the following property can be easily demonstrated:

P( a(x) + b(x) )= P( a(x) ) P ( b(x))

Taking into account that in the RS encoder the polynomial used to encode the data is constant, the polynomial multiplication is implemented by the multiplication for the constant gi, where gi are the coefficients of the generator polynomial g(x). The constant multiplier is implemented by using an suitable network of XOR gates. The parity P(c(x)) of the result can be evaluated as

where A is the set of inputs that are evaluated an odd number of times. For the input bits evaluated an even number of times additional outputs are added.

B. Self-Checking RS Encoder

The implementation of RS encoders are usually based on an LFSR, which implements the polynomials division over the finite field. In Fig.2, the implementation of an RS encoder is shown. The additions and multiplications are performed on GF(2m) and gi are the coefficients of the generator polynomial g(x).

The RS encoder architecture is composed by slice blocks containing a constant multiplier, an adder, and a register.

Fig. 2. RS Encoder.

255

The number of slices to design for an RS(n, k) code is n - k. The self-checking implementation requires the insertion of some parity prediction blocks and a parity checker.

Fig.3 Self-Checking slice

The correctness of each slice is checked by using the architecture shown Fig. 3. The input and output signals to the slice are as follows.

• Ain is the registered output of the previous slice.

• Pin is the registered parity of the previous slice.

• Fin is the feed-back of the LFSR.

• PFin is the parity of the feed-back input.

• Aout is the result of the multiplication and addition operation.

• Pout is the predicted parity of the result.

The parity prediction block is implemented. It must be noticed that some constrains in the implementation of the constant multiplier must be added in order to avoid interference between different outputs when a fault occurs.

These interferences are due to the sharing of intermediate results between different outputs and, therefore, can be avoided by using networks with fan-out equal to one. The parity checker block checks if the parity of the inputs is even or odd.

This considerations guarantee the self-checking property of the checker. It can be noticed that, due to the LFSR-based structure of the RS encoder, there are no control state machines to be protected against faults

Fig 4. Simulation Results

Therefore, the structure is given for the RS Encoder and this is designed with the help of the Slice as given in the above shown Fig 3. and also Fig 4. shows the simulation results.

IV. REED-SOLOMON DECODER

The self-checking decoder can be designed by understanding the CED concept. This is mainly used for the finding the position and magnitude up to t errors and it is able to correct them.

A. Characteristics of the RS Decoder

The design of the self-checking decoder starting by the CED implementation of the arithmetic blocks and using the self-checking algorithm for solving the key equation presents the following drawbacks.

1) The internal structure of the decoder must be modified substituting the elementary operations with the corresponding CED ones. Therefore, the decoder performances in terms of maximum operating frequency, area occupation, and power consumption can be very different with respect to the nonself- checking implementation.

2) The self-checking implementation is strongly dependent from the chosen decoder architecture

3) A good knowledge of the finite field arithmetic is essential for the implementation of GF(2m) arithmetic blocks.

In the solution presented in this paper, differently from the previously discussed approaches, the implementation of the self-checking RS decoder is based on the use of a standard RS decoder completed by adding suitable hardware blocks to check its functionality. In this way, the proposed method can be directly used for a wide range of

256

different decoder algorithms enabling the use of important design concepts such as reusability. The proposed technique starts from the following two main properties of the fault-free decoder.

Property 1: The decoder output is always a codeword.

Property 2: The Hamming weight of the error polynomial is not greater than t.

If a fault occurs inside the decoder the previously outlined observation is able to detect the occurrence of the fault. When the fault is activated, i.e., the output is different from the correct one due to the presence of the fault , the following two cases occur.

• The first case the decoder gives as output a non-codeword, and this case can be detected by property 1. This is the most probable case because the decoder computes the error polynomial and obtains the output codeword by calculating c(x) = c(x) + e(x), where c(x) is the received polynomial.

• If the output of the faulty decoder is a wrong codeword the detection of this fault is easily performed by evaluating the Hamming weight of the error polynomial e(x).The error polynomial can be provided by the encoder as an additional output or can be evaluated by comparing the received polynomial and the provided output .

If one of the two properties is not respected a fault inside the decoder is detected, while if all the observations are satisfied we can detect that no faults are activated inside the decoder.

This approach is completely independent by the assumed fault set and it is based only on the assumption that the fault-free behavior of the decoder provides always a codeword as output. This assumption is valid for a wide range of decoder architectures. For some decoders that are able to perform a miscorrection detection for some received polynomials with more than t errors suitable modification of our proposed method could be done.

B. concurrent Error detection for the RS Decoder

In Fig. 5, the CED implementation of the RS decoder is shown. Its main blocks are as follows.

• RS decoder, i.e., the block to be checked.

• An optional error polynomial recovery block. This block is needed if the RS decoder does not provide at the output the error polynomial coefficients.

• Hamming weight counter, that checks the number of nonzero coefficients of the error polynomial.

• Codeword checker, that checks if the output data of the RS decoder form a correct codeword.

• Error detection block that take as inputs the output of the Hamming weight counter and of the codeword checker and provides an error detection signal if a fault in the RS decoder has been detected.

The RS decoder can be considered as a black box performing an algorithm for the error

Fig.5 CED Scheme of RS Decoder

257

detection and correction of the input data (the coefficients of the received data forming the polynomial

.

The error polynomial recovery block is composed by a shifter register of length L (the latency of the decoder) and by a GF(2m) adder having as

operands the coefficients of c(x) and .

The Hamming weight counter is composed by the following:

1) a comparator indicating (at each clock cycle) if the e(x) coefficients are zero;

2) a counter that takes into account the number of nonzero coefficients;

3) a comparator between the counter output and t that is the maximum allowed number of nonzero elements.

The codeword checker block checks if the reconstructed c(x) is a codeword, i.e., if it is exactly divisible for the generator polynomial g(x). The two Types of this block are proposed.

The error detection block takes as inputs the outputs of the Hamming weight counter and the outputs of the codeword checker. The additional blocks used to detect faults inside the decoder are susceptible to faults. For the codeword checker and the error polynomial generator blocks only register and GF(2m) addition and constant multiplication are used and, therefore, the same consideration of RS encoder can be used to obtain the self-checking property of these blocks. For the counters and the comparator used in the Hamming weight counter and error detection blocks, many efficient techniques are found.

V. CONCLUSION

In this paper self-checking architectures for an RS encoder and decoder are described. For the self-checking RS decoder two main properties of the fault free decoder have been identified and used to detect faults inside the decoder. This method can be used for a wide range of algorithm implementing the decoder function.

VI. REFERENCES

[1] Altera Corp., San Jose, CA, “Altera Reed-Solomon compiler user guide 3.3.3,” 2006. [2] Xilinx, San Jose, CA, “Xilinx logicore Reed-Solomon decoder v5.1,” 2006. [3] S. B. Sarmadi and M. A. Hasan, “Concurrent error detection of polynomial basis multiplication over extension fields using a multiple-bit parity scheme,” in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., 2005, pp. 102–110. [4] G. C. Cardarilli, S. Pontarelli, M. Re, and A. Salsano, “Design of a self checking reed solomon encoder,” in Proc. 11th IEEE Int. On-Line Test. Symp. (IOLTS’05), 2005, pp. 201–202 [5] G. C. Cardarilli, S. Pontarelli, M. Re, and A. Salsano, “A self checking Reed Solomon encoder: Design and analysis,” in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., 2005, pp. 111–119. [6] A. R. Masoleh and M. A. Hasan, “Low complexity bit parallel architectures for polynomial basis multiplication over GF(2m), computers,” IEEE Trans. Comput., vol. 53, no. 8, pp. 945–959, Aug. 2004. [7] J. Gambles, L. Miles, J. Has, W. Smith, and S. Whitaker, “An ultra-lowpower, radiation-tolerant reed solomon encoder for space applications,” in Proc. IEEE Custom Integr. Circuits Conf., 2003, pp. 631–634. [8] A. R. Masoleh and M. A. Hasan, “Error Detection in Polynomial Basis Multipliers over Binary Extension Fields,” in Lecture Notes in Computer Science. New York: Springer-Verlag, 2003, vol. 2523, pp. 515–528. [9] Y.-C. Chuang and C.-W. Wu, “On-line error detection schemes for a systolic finite-field inverter,” in Proc. 7th Asian Test Symp., 1998, pp. 301–305. [10] I. M. Boyarinov, “Self-checking algorithm of solving the key equation,” in Proc. IEEE Int. Symp. Inf. Theory, 1998, p. 292. [11] M. Gossel, S. Fenn, and D. Taylor, “On-line error detection for finite field multipliers,” in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., 1997, pp. 307–311. [12] C. Bolchini, F. Salice, and D. Sciuto, “A novel methodology for designing TSC networks based on the parity bit code,” in Proc. Eur. Design Test Conf., 1997, pp. 440–444. [13] D. Nikolos, “Design techniques for testable embedded error checkers, computers,” Computer, vol. 23, no. 7, pp. 84–88, Jul. 1990. [14] P. K. Lala, “Fault Tolerant and Fault Testable Hardware Design”. Englewood Cliffs, NJ: Prentice-Hall, 1985 [15]R. E. Blahut, “Theory and Practice of Error Control Codes”. Reading, MA: Addison-Wesl. [16] André Sülflow Rolf Drechsler “Modeling a Fully Scalable Reed-Solomon Encoder/Decoder over GF(pm) in SystemC” André Sülflow Rolf Drechsler Institute of Computer Science University of Bremen 28359 Bremen, Germany [17] Dong Hoon LEE and Jong Tae KIM*, “ Efficient Recursive Cell Architecture for the Reed-Solomon Decoder”, Jounal of the Korean Physical Society, Vol. 40, No.1, January 2002, pp. 82~86. [18] Kenny Chung Chung Wai, Dr. Shanchieh Jay Yang “Field Programmable Gate Array Implementation of Reed-Solomon Code, RS(255,239)” Xelic Inc., Pittsford, New York 14534, R.I.T, Rochester, New York 14623.

258

Design of High Speed Architectures for MAP Turbo Decoders 1Lakshmi .S.Kumar, 2D.Jackuline Moni

1II M.E, Applied Electronics ,2Associate Professor, 1, 2Karunya University, Coimbatore

1Email id:[email protected]

Abstract—The maximum a posterior probability (MAP) algorithm has been widely used in Turbo decoding for its outstanding performance. However, it is very challenging to design high-speed MAP decoders because of inherent recursive computations. This paper presents two novel high speed recursion architectures for MAP-based Turbo decoders. Algorithmic transformation,approximation,and architectural optimization are incorporated in the proposed designs to reduce the critical path.

Index Terms—Error correction codes, high-speed design, maximum a posterior probability (MAP) decoder, Turbo code, VLSI.

I. INTRODUCTION

Turbo code [1] invented in 1993, has attracted tremendous attentions in both academics and industry for its outstanding performance, rich applications can be found in wireless and satellite communications [2], [3]. Practical Turbo decoders usually employ serial decoding architectures [4] for area efficiency. Thus, the throughput of a Turbo decoder is highly limited by the clock speed and the maximum number of iterations to be performed. To facilitate iterative decoding, Turbo decoders require soft-input soft-output decoding algorithms, among which the maximum a posterior probability (MAP) algorithm [5] is widely adopted for its excellent performance.

Due to the recursive computations inherent with the MAP algorithm, the conventional pipelining technique is not applicable for raising the effective processing speed unless one MAP decoder is used to process more than one Turbo code blocks

or sub-blocks. Among various high-speed recursion architectures in [6]–[10], the designs presented in [7] and [10] are most attractive. In [7], an offset- add-compare- select (OACS) architecture [8] is adopted to replace the traditional add-compare-select-offset (ACSO) architecture. In addition, the lookup table (LUT) is simplified with only 1-bit output, and the computation of absolute value is avoided through introduction of the reverse difference of two competing path metrics. An approximate 17% speedup over the traditional Radix-2 ASCO architecture was reported. With one-step look-ahead operation, a Radix-4 ACSO architecture can be derived. Practical Radix-4 architectures such as those presented in [9] and [10] always involve approximations in order to achieve higher effective speed-ups.For instance, the following approximation is adopted in [10]:

max * (max *(A,B), max *(C,D))

max *(max (A, B), max(C, D)) (1)

where

max*(A, B) max (A, B) + log ( 1 + e-|A-B | ) (2)

II. TURBO CODES

Turbo codes were presented in 1993, and since then these codes have received a lot of interest from the research community as they offer better performance than any of the other codes at very low signal to noise ratio. Turbo codes achieve near Shannon limit error correction performance with relatively simple component codes. A BER of 10-5

259

is reported for a signal to noise ratio of 0.7 dB. Turbo coding is a forward error correction (FEC) scheme. Turbo codes consist of concatenation of two convolutional codes .Turbo codes gives better performance at low SNRs.

The turbo encoder transmits the encoded bits which form inputs to the turbo decoder. The turbo decoder decodes the information iteratively. Turbo codes can be concatenated in series, parallel or in a hybrid manner. Concatenated codes can be classified as parallel concatenated convolution codes (PCCC) or serial concatenated convolutional codes (SCCC). In PCCC two encoders operate on the same information bits. In SCCC, one encoder encodes the output of another encoder. The hybrid concatenation scheme consists of the combination of both parallel and serial concatenated convolutional codes. The turbo decoder has two decoders that perform iterative decoding.

The general structure of turbo encoder architecture consists of two Recursive Systematic Convolutional (RSC) encoders Encoder 1 and Encoder 2. The constituent codes are RSCs because they combine the properties of non-systematic codes and systematic codes. In the encoder architecture displayed in Figure.1 the two RSCs are identical. The N bit data block is first encoded by Encoder 1. The same data block is also interleaved and encoded by Encoder 2. The main purpose of the interleaver is to randomize burst error patterns so that it can be correctly decoded. It also helps to increase the minimum distance of the turbo code.

Fig 1.A Turbo encoder

In a turbo decoder the iterative decoding process of the turbo decoder is described. The maximum a posteriori algorithm (MAP) is used in the turbo decoder. There are three types of algorithms used in turbo decoder namely MAP, Max-Log-MAP and Log-MAP. The MAP algorithm is a forward-backward recursion algorithm, which minimizes the probability of bit error, has a high computational complexity and numerical instability. The solution to these problems is to operate in the log-domain. One advantage of operating in log-domain is that multiplication becomes addition. Addition however is not straight forward. Addition is a maximization function plus a correction term in the log domain. The Max-Log- MAP algorithm approximates addition solely as maximization.

Fig 2.A Turbo decoder

Applications of turbo codes

• Mobile radio • Digital video • Long-haul terrestrial wireless • Satellite communications and • Deep space communication

III. ADVANCED HIGH-SPEED RADIX-2 RECURSION ARCHITECTURE FOR MAP

DECODERS

For convenience in later discussion, a brief introduction of MAP-based Turbo decoder structure is given at the beginning of this section. The MAP algorithm is generally implemented in log domain

260

and thus called Log-MAP algorithm. MAP-based Turbo decoders normally adopted a sliding window approach [11] in order to reduce computation latency and memory for storing state metrics. As it is explained in [4], three recursive computation units: , , and pre- units are needed for a Log-MAP decoder. This paper is focused on the design of high-speed recursive computation units as they form the bottleneck in high speed circuit design. It is known from the Log-MAP algorithm that all three recursion units have similar architectures. So we will focus our discussion on the design of units. The traditional design for computation is illustrated in Fig. 3, where the ABS block is used to compute the absolute value of the input and the LUT block is used to implement a nonlinear function log(1+ e-x)), where x > 0. For simplicity, only one branch (i.e., one state) is drawn. The overflow approach [14] is assumed for normalization of state metrics as used in conventional Viterbi decoders.

Fig 3. Traditional recursion architecture: Arch-O.

It can be seen that the computation of the recursive loop consists of three multibit additions, the computation of absolute value and a random logic to implement the LUT. As there is only one delay element in each recursive loop, the traditional retiming technique cannot be used to reduce the critical path.

Here,an advanced Radix-2 recursion architecture shown in Fig. 4 is proposed. Here, a difference

metric is first introduced for each competing pair of states metrics (e.g., 0 and 1 in Fig. 4) so that the front-end addition and the subtraction operations can be performed simultaneously in order to reduce the computation delay of the loop. Second, a generalized LUT (see GLUT in Fig. 4) is employed that can efficiently avoid the computation of absolute value instead of introducing another subtraction operation. Third the final addition is moved to the input side as with the OACS architecture and then utilizes one stage carry-save structure to convert a three-number addition to a two-number addition. Finally, an intelligent approximation is made in order to further reduce the critical path.

Fig 4. Advanced Radix-2 fast recursion arch

The following equations are assumed for the considered recursive computation shown in Fig. 4:

0[k + 1] = max* (0[k] + 0[k], 1[k] + 3[k])

2[k + 1] =max* (0[k] + 3[k], 1[k] +0[k]) (3)

261

where max* function is defined in (2).

In addition, we split each state metric into two terms as follows:

0[k] = 0A[k] + 0B[k]

1[k] = 1A[k] + 1B[k]

2[k] = 2A[k] + 2B[k]: (4)

Similarly, the corresponding difference metric is also split into the following two terms:

01[k] = 01A[k] + 01B[k]

01A[k] = 0A[k] - 1A[k]

01B[k] = 0B[k] - 1B[k]: (5)

In this way, the original add-and-compare operation is converted as an addition of three numbers, i.e (0 + 0) - (1 + 3) = (0 - 3) + 01A + 01B (6)

where (0 - 3) is computed by branch metric unit (BMU), the time index [k] is omitted for simplicity. In addition, the difference between the two outputs from two GLUTs, i.e., 01B in the figure, can be neglected. If one competing path metrics (e.g., p0 = (0 + 0) is significantly larger than the other one (e.g., p1 = (1 + 3), the GLUT output will not change the decision anyway due to their small magnitudes. On the other hand, if the two competing path metrics are so close that adding or removing a small value output from one GLUT may change the decision (e.g., from p0 > p1 to p1 > p0), picking any survivor should not make big difference.

At the input side, a small circuitry shown in Fig. 5 is employed to convert an addition of three

numbers to an addition of two numbers, where FA and HA represents full-adder and half-adder, respectively, XOR stands for exclusive OR gate, d0 and d1 correspond to the 2-bit output of GLUT. The state metrics and branch metrics are represented with 9 and 6 bits, respectively, in this example. The sign extension is only applied to the branch metrics. It should be noted that an extra addition operation might be required to integrate each state metric before storing it into the memory. The GLUT structure is shown in Fig. 6, where the computation of absolute value is eliminated by including the sign bit into two logic blocks, i.e., Ls2 and ELUT, where the Ls2 function block is used to detect if the absolute value of the input is less than 2.0, and the ELUT block is a small LUT with 3-bit inputs and 2-bit outputs. It can be derived that Z = S b7,. . . ,+b1 + b + S(b7, . . . , b1b0). It was reported in [13] that using two output values for the LUT only caused a performance loss of 0.03 dB from the floating point simulation for a four state Turbo code. The approximation is described as follows:

If x<2; f(x) =3/8; else f(x) =0 (7)

where x and f(x) stand for the input and the output of the LUT, respectively. In this approach, we only need to check if the absolute value of the input is less than 2, which can be performed by the Ls2 block in Fig. 6.

Fig 5. Carry–save structure in the front end of Arch-A.

A drawback of this method is that its performance would be significantly degraded if only two bits are kept for the fractional part of the state metrics,

262

which is generally the case. In our design, both the inputs and outputs of the LUT are quantized in four levels.

Fig. 6. Structure of GLUT used in Arch-A.

The inputs to ELUT are treated as a 3-bit signed binary number. The outputs of ELUT are ANDed with the output of Ls2 block. This means, if the absolute value of the input is greater than 2.0, the output from the GLUT is 0. Otherwise, the output from ELUT will be the final output. The ELUT can be implemented with combinational logic for high-speed applications. The computation latency is smaller than the latency of Ls2 block. Therefore, the overall latency of the GLUT is almost the same as the previously discussed simplified method whose total delay consists of one 2:1 multiplexer gate delay and the computation delay of logic block Ls2. After all the previous optimization, the critical

path of the recursive architecture is reduced to two multibit additions, one 2:1 MUX operation, and 1-bit addition operation, which saves nearly two multibit adder delay compared to the traditional ACSO architecture.

IV. RESULTS

The output of the advanced Radix-2 recursion architecture is as shown in the figure 7.It shows how an uncoded sequence is decoded using the Turbo decoder. The output is synthesized using Xilinx synthesizer and simulated using Modelsim. The critical path of the recursive architecture is reduced to two multibit additions, one 2:1 MUX

operation, and 1-bit addition operation, which saves nearly two multibit adder delay compared to the traditional ACSO architecture.

Fig 7. Output of Radix-2 architecture

V.CONCLUSION AND FUTURE WORKS

In this paper, we proposed a Radix-2 recursion architecture for MAP decoders which reduced the critical path to two multibit additions, one 2:1 MUX operation, and 1-bit addition operation, which saves nearly two multibit adder delay . Our future work is to implement an improved radix-4 architecture for MAP decoders. And the performance comparison of both architecture is to be done.

REFERENCES

[1] C. Berrou, A. Clavieux, and P. Thitimajshia, “Near Shannon

limit error correcting coding and decoding: Turbo codes,” in Proc. ICC, 1993, pp. 1064–1070. [2] “Technical Specification Group Radio Access Network, Multiplexing and Channel Coding (TS 25.212 Version 3.0.0)” 3rd Generation PartnershipProject (3GPP) [Online]. Available: http://www.3gpp.org [3] 3rd Generation Partnership Project 2 (3GPP2) [Online]. Available: http://www.3gpp2.org [4] H. Suzuki, Z. Wang, and K. K. Parhi, “A K = 3, 2 Mbps low power Turbo decoder for 3rd generation W-CDMA systems,” in Proc. IEEE

263

Custom Integr. Circuits Conf. (CICC), 2000, pp. 39–42. [5] L. Bahl, J. Jelinek, J. Raviv, and F. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inf. Theory, vol. IT-20, no. 2, pp. 284–287, Mar. 1974. [6] S.-J. Lee, N. Shanbhag, and A. Singer, “A 285-MHz pipelined MAP decoder in 0.18 _m CMOS,” IEEE J. Solid-State Circuits, vol. 40, no. 8, pp. 1718–1725, Aug. 2005. [7] P. Urard et al., “A generic 350 Mb/s Turbo codec based on a 16-state Turbo decoder,” in IEEE ISSCC Dig. Tech. Papers, 2004, pp. 424–433. [8] E. Boutillon, W. Gross, and P. Gulak, “VLSI architectures for the MAP algorithm,” IEEE Trans. Commun., vol. 51, no. 2, pp. 175–185, Feb. 2003. [9] T. Miyauchi, K. Yamamoto, and T. Yokokawa, “High-performance programmable SISO decoder VLSI implementation for decoding Turbo codes,” in Proc. IEEE Global Telecommun. Conf., 2001, pp. 305–309. [10] M. Bickerstaff, L. Davis, C. Thomas, D. Garret, and C. Nicol, “A 24 Mb/s radix-4 LogMAP Turbo decoder for 3 GPP-HSDPA mobile wireless,” in IEEE ISSCC Dig. Tech. Papers, 2003, pp. 150–151. [11] A. J. Viterbi, “An intuitive justification of the MAP decoder for convolutional codes,” IEEE J. Sel. Areas Commun., vol. 16, pp. 260–264, Feb. 1998. [12] T. C. Denk and K. K. Parhi, “Exhaustive scheduling and retiming of digital signal processing systems,” IEEE Trans. Circuits Syst., Part II: Analog Dig. Signal Process., vol. 45, no. 7, pp. 821–838, Jul. 1998. [13] W. Gross and P. G. Gulak, “Simplified MAP algorithm suitable for implementation of turbo decoders,” Electron. Lett., vol. 34, no. 16, pp. 1577–1578, Aug. 1998.

[14] Y.Wu, B.D.Woerner, and T. K. Blankenship, “Data width requirement in SISO decoding with module normalization,” IEEE Trans. Commun., vol. 49, no. 11, pp. 1861–1868, Nov. 2001.

264

Technology Mapping Using Ant Colony Optimization

Jacukline Moni, S. Arumugam, M.SajanDeepak, 1,3ECE,dept,karunyauniversity

2 chief executive,bannariamman edu cational trust

Abstract The ant colony optimization [2] meta-heuristic is adopted from the natural foraging behavior of real ants and has been used to find good solutions to a wide spectrum of combinatorial optimization problem. Ant colonies [2][3] are capable of finding shortest path between nest and food. In ACO[2] algorithm ants construct solutions with help of local decisions. And this approach is being used for optimizing wire length and minimizing the area[4]. And performance wise the disadvantages in other optimization algorithms [4] like time consumption is reduced and also this ACO algorithm quickly converges to an optimum .It is also used in other applications like traveling sales main problem and quadratic assignment problem. Field programmable gate arrays [1] are becoming increasingly important implementation platforms for digital circuits. One of the necessary requirements to effectively utilize the field programmable gate arrays[1] fixed resources is an efficient placement and routing mechanism here we use ant colony optimization algorithm[4] for placement and routing problem.

Keywords-Fpga, ACO, Propablistic rule.

I INTRODUCTION

Natural evolution has yielded biological systems in which complex collective behavior emerges from the local interaction of simple components. One example where this phenomenon can be observed is the foraging behavior of ant colonies. Ant colonies [2][3][5] are capable of finding shortest paths between their nest and food sources. This complex behavior of the colony is possible because the ants communicate indirectly by disposing traces of

pheromone [2] as they walk along a chosen path. Following ants most likely prefer those paths possessing the strongest pheromone information, there by refreshing or further increasing the respective amounts of pheromone. Since ants on short paths are Quicker, pheromone traces [5][6] on these paths are increased very frequently. On the other hand, pheromone information is permanently reduced by evaporation [3], which diminishes the influence of formerly chosen unfavorable paths. This combination focuses the search process on short, favorable paths. In ACO [2][3], a set of artificial ants searches for good solutions for the optimization problem under consideration. Each ant constructs a solution by making a sequence of local decisions. Its decisions are guided by pheromone information and some additional heuristic information .After a number of ants have constructed solutions, the best ants are allowed to update the pheromone information along their path through the decision graph. Evaporation is accomplished by globally reducing the pheromone information by a certain percentage. This process is repeated iteratively until a stopping criterion is met. ACO[2] has shown good performance on several combinatorial optimization problems[5][7], including scheduling, vehicle routing, constraint satisfaction, and the quadratic assignment problem[5]. In this paper, we adapt an ACO algorithm to field programmable gate arrays (FPGAs). FPGAs [1]are used for a wide range of applications, e.g. network communication ,video communication and processing and Cryptographic applications. We show that ACO can also be implemented on FPGAs,[1] leading to significant speedups in runtime compared to implementations in software on sequential machines. Standard ACO algorithm is not very well suited to implementation on the resources provided by current commercial FPGA architectures. Instead we suggest using the Population-based ACO, in which pheromone

265

information is replaced. By a small set (population) of good solutions discovered during the preceding iterations. Accordingly, the combination of pheromone updates and evaporation has been replaced by inserting a new good solution into the population, replacing the oldest Solution from the population.

II METHODOLOGY

The objective is to find the minimal length connecting two components. For example we can consider a component at ‘i’ which is the source and it has to be connected to the component at ‘j’,

here the distance between two components can be clearly given by the basic equation[1][2],

Where xi and yi denotes the coordinates of the component at ‘i’ or can be generally defined as a graph (N, E) where N denotes the node and E denotes the components. In ant system ants built the solution by moving over the problem graph .During the iteration of an ant system each ants K, K=1,2…..m builds a tour in which a probabilistic transition rule[2][3][4] is applied. Iterations are indexed by( 1 to tmax) where tmax is the maximum number of iteration .And the act of choosing the next node is give by a probabilistic transition rule[2], The transition rule is the probability for ant K to go from city I to city j while building its t’th tour is called random

proportional transition rule. And the rule is given by

( ) ( ) [ ]( )[ ] [ ] ⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

=∑ βα

βα

ητ

ητ

ijij

ijtijkij t

tp

And simultaneous deposition and evaporation of pheromone[2][3][4] takes place and the paths with shorter distance will be having the nigh concentration fo pheromone and the path with longer distance will be having less concentration of pheromone. The pheromone updating is given by the equation

ijijij TtTT ∆+−−= )1()1( ρ

And simultaneously the updating and the evaporation of the pheromone thus take place

and the evaporation of the pheromone is given by the equation

ijij τρτ )1( −=

With these two equations both the simultaneous updating and the evaporation of the pheromone takes place.

III ACO PARAMETERS

The parameters [3][5][7] are important and is to be varied to yield the best results for every run of the program the results that are obtained are noted and for those values for which it yields the minimum value is been considered, in the same way by varying the parameters value and the value for the best results are given as row is 0.5 ,alpha value is 1 and beta value is 5.

5.0

5

1

===

ςβα

IV RESULTS

The results here shows the comparison of the ant colony optimization with simulated annealing

Device utilization by simulated annealing for B01

Device utilization summary:

Selected Device : 2s15cs144-6

266

Number of Slices: 8 out of 192 4%

Number of Slice Flip Flops: 13 out of 384 3%

Number of 4 input LUTs: 14 out of 384 3%

Number of bonded IOBs: 4 4 out of 90 48%

Number of GCLKs: 2 Lut of 4 50%

Device utilization by ACO for B01


Number of External GCLKIOBs 3 out of 4 75%

Number of External IOBs 28 out of 86 32%

Number of LOCed External IOBs 0 out of 28 0%

Number of SLICEs 4 out of 192 2%

Number of GCLKs 3 out of 4 75%

Device utilization by simulated anealing for B02

Device utilization summary

Selected Device: 2s15cs144-6


Number of Slice Flip Flops: 12 out of 384 3%


Number of bonded IOBs: 40 out of 90 44%

Number of GCLKs: 1 out of 4 25%

Device utilization by ACO for B02


Selected Device : 2s15cs144-6


Number of Flip Flops: 12 out of 384 3%


Number of bonded IOBs: 24 out of 90 26%

Number of GCLKs: 2 out of 4 50%

CONCLUSION

The ant colony algorithm is done and the results were compared with other algorithms like simulated annealing and the recourses that are being occupied by both the algorithms were compared and ant colony optimization algorithm has better results in the device utilization.

VI FUTURE WORK

Preliminary work has been carried out which helps in the minimizing of the resource that is being utilized by the implementing device.

The future work can be carried out by modifying the ant colony optimization algorithm in such a way

267

that it even takes lesser recourses for accomplishing the task

REFERENCES

[1]B.Scheuerman K>So,M.Guntsch,M.Middendorf,O.Diessel,H.Elgindy,H.Schmeck “FPGA placement and routing ant colony optimization” 26 january 2004. [2]J.L.Deneubourg J.M Passteels, J.C Verhaege “propablistic behaviour in ants:a strategy of error?” 105 (1983) 259-271. [3]M.Dorgio,”optimization learning and natural algorithmsElettronica Politeenico Di Milano,italy 2991 [4]C.Solonon,”Ants can solve constraint satisfaction problem “ IEEE trans Evolut Comput.6(4)(2002) 347-357. [5] L.M Ganmbardella, E.Taillard,M.Dorgio, “ant colonies for the quadaratic assignment problem”J.Operat Res Soc.50 (1999) 167 -176 [6] M.Dorgio,”Parallel ant system:an experimental study” Manuscript 1993. [ 7] E-G Talbi,O.Roux,c.Fonlupt,D.Robillard “parallel ant colonies for combinatorial optimization problem” parallel and distributed processing,11 IPPS/SPDP’99 workshop,no1586 in LNCS Springer-Verlag 1999.pp. 239-247. [8]M.Rahoul,R.Hadji,V.Bachelet,”parallel ant system for the set covering problem,in ant algorithms,proceedings of third international Workshop ANTS 2002,LNCS 2463,Springer-Verlag Brussels Belgium 2002.pp. 262-267. [9] M.Middendorf,F.Reischle,H.Schmeck,”Multi colony optimization”J.Parallel Distrib Comput 62(9)(2002) 1421-1432. [10]R.Miller ,V.K Prasanna kumar, D.I Reisis,Q.F Stout,” parallel computation on reconfigurable meshes,IEEE trans. Comput 42(6)(1993) 678-692 conference on advanced research in VLSI,1998. [11] O.Cheung, P.Leong,”implementation of an FPGA based accelator for virtual private networks”in IEEE international conference on field programmable technology HongKong 2002 .pp.34-43

[12]S.Bade ,B.Hutching, ”Fpga based stochastic neural network implementation”proceedings of the IEEE workshop on FPGA for custom computing machines,1994,pp.180-198. [13]P.Lysaght, J.Stockwood, J.Law ,D.Grima,”Artifical neural network implementation on a fine grained Fpga in field programmable Logic “1994,pp.421-431. [14]M.Guntsch,M.Middendorf,”A population based approach for ACO”in: S.Cagnoni et al,application of Evolutionary computing-EvoWorkshop 2002: EvoCOP.pp.72-81. [15] P.Albuquerque and A.Dupuis: “A parallel Cellular Ant Colony Algorithm for Clustering and sorting” proc of ACRI 2002, LNCS 2493,springer,220-230(2002). [16] cordon,O., F.Herrera and T.Stuzle “A Review on the Ant Colony Optimization Metaheuristic”:Basis,Models and new trends. Math ware and softcomputing 9(2002). [17] P.Delisle,M.Krajecki,M.Gravel and C.Gagne: “parallel implementation of an ant colony optimization metaheuristic” with openMP proceeding 3rd European workshop on openMP (2001). .[18] M.Dorigo,V.Manieezzo na dA.Colorni:”the ant system:optimization by a Colony of cooperating agents”;IEEE trans.sys.,man,Cybernetics B26,29-41(1996). [19] M.SFiorenzo Catalono and F.Malucelli: “parallel randomized heuristics for the set covering problem”, international journal of practicall parallel computing 10(4):113-132 (2001). [21] H.Kawamura,M.Yamamoto,K.Suzuki, and A.Ohunchi: “Multiple ant colony algorithms based on colony level interactions”.IEICE Transactions on fundamental,E83-A(2): 371-379(2000) [22] A.E LanghM AND p>w.Grant. “Using competing ant colonies to solve K-way portioning problems with foraging and raiding strategies in proc 5th European conference on artificial life,ECAL’99,Springer,LNCS 1674,621-625(1999)

OUR SPONSORS

Southern Scientific Instruments

Chennai.

Scientronics

Authorised dealer for Scien Tech,Coimbatore.

CG-Core EL Programmable Solutions Pvt.Ltd

Bangalore.

Hi-Tech Electronics

Trichy.

Proceedings of Vccc'08[1]

Documents

satheesh mr

anitha mr

smys mr

reebarex mr

jeyachandran mr

retnam mr

jeyaseelan mr

wilson christopher mr