Top Banner
INTERNSHIP REPORT IMPLEMENTATION OF A DELAY LOCKED LOOP (DLL) ON AN FPGA At INDIAN INSTITUTE OF SCIENCE, BANGALORE ECE DEPARTMENT For S.N.BOSE SCHOLARS PROGRAM JUNE – AUGUST 2013 By ANMOL VITTAL CHAVAN PURDUE UNIVERSITY, West Lafayette Professor: Dr. BHARADWAJ AMRUTUR
26

iisc internship report final

Jan 26, 2017

Download

Documents

Anmol Chavan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: iisc internship report final

INTERNSHIP REPORT

IMPLEMENTATION OF A DELAY LOCKED LOOP (DLL) ON AN FPGA

At

INDIAN INSTITUTE OF SCIENCE, BANGALORE

ECE DEPARTMENT

For S.N.BOSE SCHOLARS PROGRAM

JUNE – AUGUST 2013

By

ANMOL VITTAL CHAVAN

PURDUE UNIVERSITY, West Lafayette

Professor: Dr. BHARADWAJ AMRUTUR

Page 2: iisc internship report final

2 Internship Report Anmol Vittal Chavan

Purdue University

CONTENTS Topic Page number

1. Abstract 3

2. Objective 3

3. Theory 4 - 8

4. Design and simulation 9 - 14

5. Results 15 - 21

6. Future Scope 21

7. Acknowledgements 22

8. References 23

9. Appendix 24 -26

Page 3: iisc internship report final

3 Internship Report Anmol Vittal Chavan

Purdue University

Abstract

The goal of this project was to implement a Delay locked loop (DLL) based on a glitch free

coarse delay line onto an FPGA. A delay locked loop is a feedback loop which is used to

generate a specified delay in a system. It consists of a delay element, a phase detector and a

filter. The phase detector used is a positive edge triggered D- Flip flop and the filter is a

synchronous up down counter. The components (decoder, each unit delay cell, flip flop, filter)

were programmed in VHDL and simulated using the Xilinx ISE to test for their working. The

delay range on the Xilinx Virtex II Pro FPGA was measured on the oscilloscope using NOT gate

buffers. The entire DLL was then implemented onto the FPGA. The purpose of the DLL is to

generate a delay which is equal to the time period of the reference clock of the filter.

Objective

1. To program the various components of the DLL including the glitch free coarse delay

line in VHDL and simulate them using the ISE simulator

2. To understand the working of an FPGA

3. To implement the DLL onto the FPGA and understand its working.

Page 4: iisc internship report final

4 Internship Report Anmol Vittal Chavan

Purdue University

THEORY 1) Delay Locked Loop (DLL)

A Delay locked loop is a negative feedback loop which is used to generate a specific

delay in a circuit. It consists of a delay line, a phase detector and a filter. The delay line has an

input which is given by dividing the FPGA clock and a delayed output. The input of the delay

cell is also the input to the phase detector. A positive edge triggered D- flip flop was used as a

phase detector. The output of the delay line was the clock to the D- flip flop. The phase detector

compares the phase of the reference input and the delay line output. The comparison yields a

signal proportional to the phase error. The output of the phase detector is fed as the input to a

filter.

Delayed output input

COARSE DELAY LINE

FILTER (Up-Down counter)

Phase Detector

Fig.1: DLL Architecture

Page 5: iisc internship report final

5 Internship Report Anmol Vittal Chavan

Purdue University

A synchronous up-down counter was used as the filter. The counter counts up if the input

signal is going low and counts down if the input signal is going high and adjusts the output signal

which goes into a decoder. The decoder controls the number of delay cells being used in a delay

chain. The reference clock signal to the up- down counter determines the delay to be generated.

For example, if a delay of 20 ns is required to be generated, the reference clock should have a

time period of 20ns.

2) Coarse Delay Chain

For medium frequencies (few hundreds of MHz) maintaining fine resolution for the entire

period is very challenging and area consuming. The coarse delay units help in reducing area to

generate large delays. The coarse delay chain can be controlled to generate delays in steps of

about 100 ps. “NOT” gates were used instead of other gates because they have the least delay,

which is needed for higher resolution. They also occupy lesser space. The coarse delay line is

used to set the input-output delay as close to the target as possible. The following is the

architecture of the Lattice delay unit which is a single unit in the coarse delay line.

Lattice Delay Unit

Fig 2: Structure of a Lattice delay unit

Page 6: iisc internship report final

6 Internship Report Anmol Vittal Chavan

Purdue University

Many LDU’s are connected to form a Delay line. There exists a possibility of glitch at the

output because of change in control setting at an instance where the levels of the two inputs to

the MUX being switched are different.

The following diagram illustrates such a condition.

Fig3: Glitch generated at the output due to change of setting in the coarse delay unit at an unsuitable instant.

A change in the MUX setting (which gets applied immediately) causes a change in the

output level, which is then restored once the input edge reaches the MU3 input. This generated

glitch then propagates to the output.

To solve this, it was ensured that the change in the MUX control signals are activated only when

both the inputs to the MUX are identical and settled. The input edge which takes time to

propagate to the MUX under consideration is used to change the MUX control settings.

Page 7: iisc internship report final

7 Internship Report Anmol Vittal Chavan

Purdue University

A signal is generated by sampling the original control signal Sk by a negative edge triggered d

flip-flop clocked with the output of the corresponding coarse delay stage. This ensures that the

inputs of the MUX whose settings are being altered are stable by the time the select signals are

actually changed.

Fig4: D - flip flops for glitch free course delay switching 3) Field Programmable Gate Array (FPGA)

FPGAs are programmable semiconductor devices that are based around a matrix of

Configurable Logic Blocks (CLBs) connected through programmable interconnects. Its

configuration is generally specified using a hardware description language (HDL). In this

project, VHDL was used. FPGAs contain programmable logic components called "logic blocks",

and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together". As

opposed to Application Specific Integrated Circuits (ASICs), where the device is custom built for

the particular design, FPGAs can be programmed to the desired application or functionality

requirements. Much of the logic in a CLB is implemented using very small amounts of RAM in

the form of LUTs (Look up tables). All combinatorial logic (ANDs, ORs, NANDs, XORs, and

Page 8: iisc internship report final

8 Internship Report Anmol Vittal Chavan

Purdue University

so on) is implemented as truth tables within LUT memory. FPGAs allow designers to change

their designs very late in the design cycle

Fig5: FPGA parts The FPGA used in this project was the Xilinx Virtex 2 PRO.

Fig6: Xilinx Virtex-II PRO FPGA

Page 9: iisc internship report final

9 Internship Report Anmol Vittal Chavan

Purdue University

Design and Simulation

DLL system was programmed in a step by step manner and then all the components were

brought together. Each part was programmed in vhdl and tested for its working using the ISE

simulator. The first step was to program the delay line.

A single Lattice delay unit was programmed in vhdl and simulated using the Xilinx ISE

simulator. The multiplexer input was controlled in the vhdl test bench and a delay of 10 ns was

given to each inverter. The delay signal was observed to work as desired in the simulation.

Fig7: Single LDU

Fig8: Simulation of a single LDU

Page 10: iisc internship report final

10 Internship Report Anmol Vittal Chavan

Purdue University

After this a coarse delay line was made with 2 lattice delay units and 2 d-flip flops to

remove glitch. A vhdl test bench was made and the simulation was observed for different values

of the multiplexer select line. Sk_1, sk and skp1 are the select lines to the multiplexers. D is the

flip flop input.

Fig9: Schematic of 2 LDU delay line

Fig10: Simulation 2 connected LDU’s with D- flip flops

Page 11: iisc internship report final

11 Internship Report Anmol Vittal Chavan

Purdue University

After the LDU was tested and simulation proved to be correct, an eight LDU coarse delay

chain was programmed, with D-flip flops at each cell as described in the theory to remove glitch.

A 3 to 8 bit decoder was coded to control the inputs of the flip flops in order to select the number

of cells to be used in the delay line to generate any particular delay.

Fig11: Schematic of 8 LDU delay line

Page 12: iisc internship report final

12 Internship Report Anmol Vittal Chavan

Purdue University

Fig12: Simulation of an eight LDU delay line with 3 to 8 decoder control

To make things simpler and to add more LDU’s to the delay line, the same 8 unit delay

chain was made using a “for” loop. This made programming simpler for future modifications. A

test bench was made using which, the delay line input, reset, select lines of the muxes and the

decoder input were manipulated to verify if the delayed waveform simulation was correct.

“outer” is the delay line output. Delay could be calculated by seeing the difference in rising

edges between “in1[0]” and “outer”.

Page 13: iisc internship report final

13 Internship Report Anmol Vittal Chavan

Purdue University

Fig13: Schematic of 8 LDU delay line using “for” loop

Fig14: Simulation of 8 LDU delay line created using for loop

Page 14: iisc internship report final

14 Internship Report Anmol Vittal Chavan

Purdue University

In order to increase the delay range, more LDUs had to be added to the coarse delay line.

Therefore a 32 cell delay line was programmed controlled by a 5 to 32 bit decoder. The decoder

output was thermometric. Now, 4 of these delay lines with 32 cells each were cascaded, with the

output of each delay line going into the input of the next one, thus making a 128 cell delay line

controlled by a 5 to 32 bit decoder. “cloc” is the reference clock to up down counter. “op” is

output of the phase detector and “ou” is the output of the filter. “ou’ is fed as the decoder input.

Fig15: Simulation of Delay Locked loop

Fig16: Architecture of DLL

Page 15: iisc internship report final

15 Internship Report Anmol Vittal Chavan

Purdue University

RESULTS

To test the delay on the Virtex II pro FPGA, an experiment was conducted with varying

number of buffers. The delay was measured for each step and a graph was plotted in excel.

no. of buffers 

Delay (ns) 

1 2.095

2 2.3

3 3.399

4 3.57

5 4.24

6 4.81

7 5.65

8 6.12

9 6.79

10 7.336

20 13.27

30 20.34

40 27.09

50 34.07

60 41.26

70 47.68

80 53.22

90 60.78

100 70.84

200 134.38

300 209.3

400 274.52

500 347.47

600 409.38

700 473

800 541.06

900 590.76

1000 708.1 Table 1: Change in delay with increasing number of buffers

Page 16: iisc internship report final

16 Internship Report Anmol Vittal Chavan

Purdue University

It was observed that the graph of delay v/s no. of buffers was fairly linear. This gave

us a fair idea that no 2 same not gates may be used on the FPGA every time it is programmed.

There all reading must be taken when the program is run once by using the components on the

FPGA board. Also the “keep” attribute is used in the vhdl program to prevent the buffers from

getting optimized out.

Fig17: Plot of delay v/s number of buffers

A 10 MHz input signal was given to a 32 LDU delay line. Since a test bench cannot be

specified for FPGA implementation, the FPGA clock has to be divided to provide the input

signal. Reset was specified and the 5 inputs of the decoder were mapped to the switches on the

Page 17: iisc internship report final

17 Internship Report Anmol Vittal Chavan

Purdue University

board in the ucf file by looking at the data sheet of the FPGA. Different inputs were given to the

decoder and the delay was measured and plotted in excel

decoder Input 

Delay (ns) 

STEP SIZE 

0 0 0 0 0  2.67  0 0 0 0 1  3.15 0.480 0 0 1 0  3.84 0.690 0 0 1 1  6.24 2.40 0 1 0 0  6.96 0.720 0 1 0 1  7.29 0.330 0 1 1 0  8.06 0.770 0 1 1 1  8.59 0.530 1 0 0 0  9.33 0.740 1 0 0 1  9.86 0.530 1 0 1 0  10.68 0.820 1 0 1 1  10.91 0.230 1 1 0 0  12.13 1.220 1 1 0 1  13.19 1.060 1 1 1 0  13.79 0.60 1 1 1 1  14.25 0.461 0 0 0 0  15.28 1.031 0 0 0 1  15.36 0.081 0 0 1 0  15.6 0.241 0 0 1 1  15.98 0.381 0 1 0 0  17.29 1.311 0 1 0 1  17.31 0.021 0 1 1 0  18.24 0.931 0 1 1 1  18.77 0.531 1 0 0 0  19.36 0.591 1 0 0 1  19.96 0.61 1 0 1 0  20.83 0.871 1 0 1 1  21.35 0.521 1 1 0 0  23.68 2.331 1 1 0 1  24.13 0.451 1 1 1 0  25.58 1.451 1 1 1 1  26.25 0.67

Table2: Change of delay with different decoder inputs for 32 LDU delay line

Page 18: iisc internship report final

18 Internship Report Anmol Vittal Chavan

Purdue University

Fig18: Plot delay v/s decoder input and step size To verify the previous reading and to get a general idea of the delay range, the LSB

(Least significant bit) was removed and the same experiment was repeated.

Decoder input without LSB 

Delay(ns) step size 

0 0 0 0   2.46  

0 0 0 1  3.52 1.06

0 0 1 0  4.78 1.26

0 0 1 1  6.71 1.93

0 1 0 0   9.6 2.89

0 1 0 1  12.78 3.18

0 1 1 0  14.13 1.35

0 1 1 1  16.71 2.58

1 0 0 0  19.86 3.15

1 0 0 1  20.68 0.82

1 0 1 0   22.93 2.25

1 0 1 1  24.31 1.38

1 1 0 0  26.38 2.07

1 1 0 1  28.13 1.75

1 1 1 0  29.43 1.3

1 1 1 1  30.74 1.31

Table3: Delay values with decoder input without the LSB

Page 19: iisc internship report final

19 Internship Report Anmol Vittal Chavan

Purdue University

Fig19: Plot of delay v/s decoder input without LSB

Fig20: Lab setup

Page 20: iisc internship report final

20 Internship Report Anmol Vittal Chavan

Purdue University

There was now a need for a greater delay range, therefore a 128 LDU cascaded delay

line was programmed and the delay range was brought up to around 95 ns.

The DLL was finally implemented onto the FPGA with input of 16.66MHz and with reference

clock signals of 100MHz, 50MHz, 16.66MHz and 10 MHz at the filter.

For 50Mhz clock : The expected delay in this case is 20 ns. It was observed that the

delay was fluctuaing around 20ns, rising a little above it and then falling back to 20. Fig shows

this. This was because the resolution was less as only a coarse delay line was used. Also a very

simple up – down counter was used as a filter. If a PID ( Proportional Integral derivative )

controller was used the delay would be locked at 20. Due to tim constraints, this could not be

implemented. The same phenomenon was observed for reference clock cycles of 100MHz,

16.66MHz and 10MHz.

Fig21: Input and delayed signals on oscillosope showing 20 ns delay for 50MHz reference clock

Page 21: iisc internship report final

21 Internship Report Anmol Vittal Chavan

Purdue University

Fig22: Delay settling to near 20 ns for 50MHz inpur clock

Future Scope The DLL can be made to work more accurately by using a higher order filter with a PID

(Proportional integral derivative) controller. The FPGA implemented DLL can be used to

generate delay of any desired value in many future projects. The resolution of the delay line can

be improved by having a coarse- fine delay line. The implemented DLL will be developed

further and used for generating delay in future experiments.

Page 22: iisc internship report final

22 Internship Report Anmol Vittal Chavan

Purdue University

Acknowledgements I have taken efforts in this project. However it would not have been possible without

the kind support and help of many individuals. I would like to extend my sincere thanks to all of

them.

Firstly, I would like to thank Dr. Bharadwaj Amrutur for providing me with this

opportunity to pursue a summer internship at the ECE department, Indian Institute of Science.

He has always been there to guide me and to make sure the project would be a good learning

experience for me.

I would like to thank IUSSTF for providing me an oppurtunity to travel to India to do

this summer research internship.

I am highly indebted to Mr.Manikandan RR, Mr.Viveka and Mr.Rajat Bhatia for their

guidance and constant supervision as well as for providing necessary information regarding the

project.

I would like to express my gratitude towards my parents and friends for their kind

cooperation and encouragement which helped me in completion of this project.

Page 23: iisc internship report final

23 Internship Report Anmol Vittal Chavan

Purdue University

REFERENCES [1] Kumar Das, P. (2012). Precise on-chip clock skew measurement using sub-sampling

and applications. (PHD Thesis).

[2] Yang, C. (2003). Delay locked loops: An overview. (UCSB).

[3] Tierno, J., & Rylyakov, A. (2008). A wide power supply range, wide tuning range,

all static cmos all digital pll in 65 nm soi. IEEE JOURNAL OF SOLID-STATE CIRCUITS,

43(1).

[4] www.xilinx.com

[5] Perdroni, V. (2004). Circuit design with vhdl.

[6] Xilinx II pro FPGA data sheet (XUPV2P)

[7] Xilinx Constraints guide

[8] http://www.ni.com/white-paper/6983/en/

Page 24: iisc internship report final

24 Internship Report Anmol Vittal Chavan

Purdue University

APPENDIX

1) VHDL program made for DLL library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; ---- Uncomment the following library declaration if instantiating ---- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all; entity for8fpga is port( outX, in2, in2b, in2c, in2d, s, sb, sc, sd: inout std_logic_vector (31 downto 0); in1, in1b, in1c, in1d: inout std_logic_vector (31 downto 0); outer, outer2, outer3, outer4: inout std_logic; FPGA_CLK : in STD_LOGIC; FOUT : out STD_LOGIC; rot: in std_logic; CLOC: OUT std_logic); end for8fpga; architecture Behavioral of for8fpga is component fiveto32 port(sel: in std_logic; inp: in std_logic_vector (4 downto 0); out1: inout std_logic_vector (31 downto 0)); end component; component dff port (d: in std_logic; clk: in std_logic; rst: in std_logic; q: out std_logic); end component; component delay_cell_2 port( in1, in2, sk, sk_1: in std_logic; outM: inout std_logic;

outN: out std_logic); end component; component dffp port (d: in std_logic; clk: in std_logic; rst: in std_logic; q: out std_logic); end component; component up_down_counter port(clk, rst_a, mode : in std_logic; --mode=1 up counting, mode=0 down counting q : out std_logic_vector(4 downto 0)); end component; signal i, j, k, l : integer := 0; signal reset : std_logic; signal count_fref, count_fref1, count_fref_2: integer range 0 to 6000 signal op1: std_logic; signal int_FREF, int_FREF_2 : STD_LOGIC; signal innn: std_logic_vector (4 downto 0); signal sel: std_logic; signal ou1 : std_logic_vector (4 downto 0); begin reset <= rot; process(FPGA_CLK) begin if(FPGA_CLK'event and FPGA_CLK='1') then if(count_fref1<999999) then count_fref1<=count_fref+1; elsif(count_fref1 >= 999999) then count_fref1<=0; end if;

Page 25: iisc internship report final

25 Internship Report Anmol Vittal Chavan

Purdue University

end if; end process; process(FPGA_CLK) begin if(FPGA_CLK'event and FPGA_CLK='1') then if(count_fref =199999) then reset<='0'; elsif(count_fref >=999999) then-->= reset<='1'; end if; end if; end process;

--input clock process(FPGA_CLK) begin --if(reset = '0') then if(FPGA_CLK'event and FPGA_CLK='1') then if(count_fref<5) then count_fref<=count_fref+1; elsif(count_fref >= 5) then count_fref<=0; end if; end if; --end if; DFF8 : dff port map ('0', in2(30), reset, s(31)); --

rest, s(7)); end process; process(FPGA_CLK) begin --if(reset = '0') then if(FPGA_CLK'event and FPGA_CLK='1') then if(count_fref =2) then int_FREF<='0'; elsif(count_fref >=5) then-->= int_FREF<='1'; end if; end if; --end if; end process; -- clock for counter --int_FREF_2 <= FPGA_CLK;

TFF: dffp port map(not int_FREF_2, FPGA_CLK, '0', int_FREF_2);

-- delay line DEC1 : fiveto32 port map ('1', ou1, outX);--innn DFF1 : dff port map (outX(0), outer, reset, s(0)); in1(0) <= int_FREF; DC1 : delay_cell_2 port map(in1(0), in2(0), s(0), '0', in1(1) , outer); --in1(1) Q1: for i in 1 to 30 generate DFF2: dff port map( d => outX(i), clk => in2(i-1), rst => reset, --rest, q => s(i)); DC2 : delay_cell_2 port map( in1 => in1(i), in2 => in2(i), sk => s(i), sk_1 => not s(i-1), outM => in1(i+1), outN => in2(i-1)); end generate;

DC8 : delay_cell_2 port map(in1(31), in2(31), '0', not s(30), in2(31), in2(30));--in2(31) DFF1b : dff port map (outX(0), outer2, reset, sb(0)); in1b(0) <= outer; DC1b : delay_cell_2 port map(in1b(0), in2b(0), sb(0), '0', in1b(1) , outer2); --in1(1) Q2: for j in 1 to 30 generate DFF2b: dff port map( d => outX(j), clk => in2b(j-1), rst => reset, --rest, q => sb(j)); DC2b : delay_cell_2 port map( in1 => in1b(j),

Page 26: iisc internship report final

26 Internship Report Anmol Vittal Chavan

Purdue University

in2 => in2b(j), sk => sb(j), sk_1 => not sb(j-1), outM => in1b(j+1), outN => in2b(j-1)); end generate; DFF8b : dff port map ('0', in2b(30), reset, sb(31)); --rest, s(7)); -- DC8b : delay_cell_2 port map(in1b(31), in2b(31), '0', not sb(30), in2b(31), in2b(30));--in2(31)

DFF1c : dff port map (outX(0), outer3, reset, sc(0)); -- in1c(0) <= outer2; -- DC1c : delay_cell_2 port map(in1c(0), in2c(0), sc(0), '0', in1c(1) , outer3); --in1(1) Q3: for k in 1 to 30 generate DFF2c: dff port map( d => outX(k), clk => in2c(k-1), rst => reset, --rest, q => sc(k)); DC2c : delay_cell_2 port map( in1 => in1c(k), in2 => in2c(k), sk => sc(k), sk_1 => not sc(k-1), outM => in1c(k+1), outN => in2c(k-1)); end generate; DFF8c : dff port map ('0', in2c(30), reset, sc(31)); --rest, s(7)); DC8c : delay_cell_2 port map(in1c(31), in2c(31), '0', not sc(30), in2c(31), in2c(30));--in2(31)

DFF1d : dff port map (outX(0), outer4, reset, sd(0)); in1d(0) <= outer3;

DC1d : delay_cell_2 port map(in1d(0), in2d(0), sd(0), '0', in1d(1) , outer4); --in1(1) Q4: for l in 1 to 30 generate DFF2d: dff port map( d => outX(l), clk => in2d(l-1), rst => reset, --rest, q => sd(l)); DC2d : delay_cell_2 port map( in1 => in1d(l), in2 => in2d(l), sk => sd(l), sk_1 => not sd(l-1), outM => in1d(l+1), outN => in2d(l-1)); end generate; DFF8d : dff port map ('0', in2d(30), reset, sd(31)); --rest, s(7)); -- DC8d : delay_cell_2 port map(in1d(31), in2d(31), '0', not sd(30), in2d(31), in2d(30));--in2(31) FOUT <= in1(0); --int_FREF; CLOC <= NOT int_FREF_2 ; PD : dffp port map (in1(0), outer4, '0', op1); filter: up_down_counter port map ( int_FREF_2, reset, op1, ou1);

end Behavioral;