Page 1
1 INTRODUCTION
The term Video Graphics Array (VGA) refers specifically to the display
hardware first introduced with the IBM VGA is referred to as an array
instead of an adapter because it was implemented from the start as a single
chip replacing the dozens of discreet logic chips
The VGA Controller provides a simple interface between a host processor
and any VGA-compatible monitor
Taking a processor-generated picture (pixilated) from memory space the
Controller provides digital RGB values for each pixel as well as horizontal
and vertical synchronization signals in order to correctly display the picture on
a connected monitor
1
2 HISTORY
The term Video Graphics Array (VGA) refers specifically to the display
hardware first introduced with the IBM PS2 line of computers in 1987 but
through its widespread adoption has also come to mean either an analog
computer display standard the 15-pin D-subminiature VGA connector or the
640times480 resolution itself While this resolution has been superseded in the
personal computer market it is becoming a popular resolution on mobile
devices
Video Graphics Array (VGA) was the last graphical standard introduced by
IBM that the majority of PC clone manufacturers conformed to making it
today (as of 2009) the lowest common denominator that all PC graphics
hardware supports before a device-specific driver is loaded into the computer
For example the MS-Windows splash screen appears while the machine is still
operating in VGA mode which is the reason that this screen always appears in
reduced resolution and color depth
VGA was officially superseded by IBMs XGA standard but in reality it was
superseded by numerous slightly different extensions to VGA made by clone
manufacturers that came to be known collectively as Super VGA
21 VGA compared to other standard resolutions
VGA is referred to as an array instead of an adapter because it was
implemented from the start as a single chip (an ASIC) replacing the Motorola
6845 and dozens of discrete logic chips that covered the full-length ISA boards
of the MDA CGA and EGA Its single-chip implementation also allowed the
VGA to be placed directly on a PCs motherboard with a minimum of
difficulty (it only required video memory timing crystals and an external
RAMDAC) and the first IBM PS2 models were equipped with VGA on the
motherboard
2
22 The VGA specifications are as follows
256 KB Video RAM (The very first cards could be ordered with 64KB or
128KB of RAM at the cost of losing some video modes)
16-color and 256-color modes
262144-value color palette (six bits each for red green and blue)
Selectable 25175 MHz or 28322 MHz master clock
Maximum of 800 horizontal pixels
Maximum of 600 lines
Refresh rates at up to 70 Hz
Vertical blank interrupt
Packed-pixel mode 256 colors
Hardware smooth scrolling support
The VGA supports both All Points Addressable graphics modes and
alphanumeric text modes Standard graphics modes are
640times480 in 16 colors
640times350 in 16 colors
320times200 in 16 colors
320times200 in 256 colors
3
3 DESCRIPTION ARCHITECTURE
31 VGA display device
VGA display device used for displaying the images taken from the
processor in an exact format It can be used for displaying the output of any
projects Analog graphs etchellip can be directly displayed on the VGA display
monitor
31 VGA (Video Graphics Array) Controller
VGA controller controls the image to be exactly displayed on the display
device It acts as an interface between processing system and display device so
called as interface controller
4
Image or video processing system
VGA Controller
Display
4 CONTROLLER DESIGN
For this first we should know how the display device works Consider an
image below
640x480
Original image
5
Pixels of image
Display device also have pixels Display resolution (640480) is a
standard resolution of display
Pixel
This article is about the picture element For other uses see Pixel
(disambiguation)
This example shows an image with a portion greatly enlarged in which the
individual pixels are rendered as little squares and can easily be seen
6
42 A photograph of sub-pixel display elements on a laptops LCD screen
In digital imaging a pixel (or picture element [1]) is a single point in a raster
image The pixel is the smallest addressable screen element it is the smallest
unit of picture which can be controlled Each Pixel has its address The address
of a pixel corresponds to its coordinate Pixels are normally arranged in a 2-
dimensional grid and are often represented using dots or squares Each pixel is
a sample of an original image where more samples typically provide more-
accurate representations of the original The intensity of each pixel is variable
In color image systems a color is typically represented by three or four
component intensities such as red green and blue or cyan magenta yellow
and black
Color refreshing rate
Color refreshing rate =60Hz
Color refreshing rate represents the number of frames that are transmitted per
second
Minimum number of frames that can be transmitted per second by VGA
controller is 60
7
5 Hardware Description
51 Block Diagram
51 VGA Synchronization Unit
8
The Synchronization Unit provides the horizontal and vertical
synchronization signals ndash HSYNC and VSYNC ndash that are required to correctly
display a picture frame within the confines of a monitorrsquos display area
These synchronization signals are used as control inputs by the
monitorrsquos horizontal and vertical deflection circuits These circuits deflect the
electrons emitted by the three primary color electron guns (Red Green Blue)
left to right and from top to bottom respectively HSYNC provides the start
and stop times for the horizontal deflection circuit so that a line of pixels is
correctly drawn across the screen display VSYNC provides the start and stop
times for the vertical deflection circuit so that the lines of a frame are correctly
drawn from the top to the bottom of the screen display
The resolution for the display is defined by the level on the
RESOLUTION input If High the 640x480 resolution is used (VGA) If Low
the 800x600 resolution (SVGA) is used
Although the resolution determines the area of a monitorrsquos screen
within which an image can be displayed the full extents of the chosen
resolution do not have to be used The actual extents of the image display area
on the screen can be controlled by the use of the DISPSIZE_H and
DISPSIZE_V inputs These inputs determine the total number of pixels to be
used in a line and the total number of lines in a frame respectively
52 Blank pixel generation
The total number of viewable pixels in a line and viewable lines in a
frame is determined by the display resolution chosen through the
RESOLUTION input (1= 640x480 0 = 800x600) and the values received on
the DISPSIZE_H and DISPSIZE_V buses Whether the full extent of the
chosen display resolution is used or not the areas of the monitor screen to the
top bottom left and right of the viewable frame area are blanked by putting
black pixels at the required line-pixel positions This has the effect of centering
the image on the screen
9
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 2
2 HISTORY
The term Video Graphics Array (VGA) refers specifically to the display
hardware first introduced with the IBM PS2 line of computers in 1987 but
through its widespread adoption has also come to mean either an analog
computer display standard the 15-pin D-subminiature VGA connector or the
640times480 resolution itself While this resolution has been superseded in the
personal computer market it is becoming a popular resolution on mobile
devices
Video Graphics Array (VGA) was the last graphical standard introduced by
IBM that the majority of PC clone manufacturers conformed to making it
today (as of 2009) the lowest common denominator that all PC graphics
hardware supports before a device-specific driver is loaded into the computer
For example the MS-Windows splash screen appears while the machine is still
operating in VGA mode which is the reason that this screen always appears in
reduced resolution and color depth
VGA was officially superseded by IBMs XGA standard but in reality it was
superseded by numerous slightly different extensions to VGA made by clone
manufacturers that came to be known collectively as Super VGA
21 VGA compared to other standard resolutions
VGA is referred to as an array instead of an adapter because it was
implemented from the start as a single chip (an ASIC) replacing the Motorola
6845 and dozens of discrete logic chips that covered the full-length ISA boards
of the MDA CGA and EGA Its single-chip implementation also allowed the
VGA to be placed directly on a PCs motherboard with a minimum of
difficulty (it only required video memory timing crystals and an external
RAMDAC) and the first IBM PS2 models were equipped with VGA on the
motherboard
2
22 The VGA specifications are as follows
256 KB Video RAM (The very first cards could be ordered with 64KB or
128KB of RAM at the cost of losing some video modes)
16-color and 256-color modes
262144-value color palette (six bits each for red green and blue)
Selectable 25175 MHz or 28322 MHz master clock
Maximum of 800 horizontal pixels
Maximum of 600 lines
Refresh rates at up to 70 Hz
Vertical blank interrupt
Packed-pixel mode 256 colors
Hardware smooth scrolling support
The VGA supports both All Points Addressable graphics modes and
alphanumeric text modes Standard graphics modes are
640times480 in 16 colors
640times350 in 16 colors
320times200 in 16 colors
320times200 in 256 colors
3
3 DESCRIPTION ARCHITECTURE
31 VGA display device
VGA display device used for displaying the images taken from the
processor in an exact format It can be used for displaying the output of any
projects Analog graphs etchellip can be directly displayed on the VGA display
monitor
31 VGA (Video Graphics Array) Controller
VGA controller controls the image to be exactly displayed on the display
device It acts as an interface between processing system and display device so
called as interface controller
4
Image or video processing system
VGA Controller
Display
4 CONTROLLER DESIGN
For this first we should know how the display device works Consider an
image below
640x480
Original image
5
Pixels of image
Display device also have pixels Display resolution (640480) is a
standard resolution of display
Pixel
This article is about the picture element For other uses see Pixel
(disambiguation)
This example shows an image with a portion greatly enlarged in which the
individual pixels are rendered as little squares and can easily be seen
6
42 A photograph of sub-pixel display elements on a laptops LCD screen
In digital imaging a pixel (or picture element [1]) is a single point in a raster
image The pixel is the smallest addressable screen element it is the smallest
unit of picture which can be controlled Each Pixel has its address The address
of a pixel corresponds to its coordinate Pixels are normally arranged in a 2-
dimensional grid and are often represented using dots or squares Each pixel is
a sample of an original image where more samples typically provide more-
accurate representations of the original The intensity of each pixel is variable
In color image systems a color is typically represented by three or four
component intensities such as red green and blue or cyan magenta yellow
and black
Color refreshing rate
Color refreshing rate =60Hz
Color refreshing rate represents the number of frames that are transmitted per
second
Minimum number of frames that can be transmitted per second by VGA
controller is 60
7
5 Hardware Description
51 Block Diagram
51 VGA Synchronization Unit
8
The Synchronization Unit provides the horizontal and vertical
synchronization signals ndash HSYNC and VSYNC ndash that are required to correctly
display a picture frame within the confines of a monitorrsquos display area
These synchronization signals are used as control inputs by the
monitorrsquos horizontal and vertical deflection circuits These circuits deflect the
electrons emitted by the three primary color electron guns (Red Green Blue)
left to right and from top to bottom respectively HSYNC provides the start
and stop times for the horizontal deflection circuit so that a line of pixels is
correctly drawn across the screen display VSYNC provides the start and stop
times for the vertical deflection circuit so that the lines of a frame are correctly
drawn from the top to the bottom of the screen display
The resolution for the display is defined by the level on the
RESOLUTION input If High the 640x480 resolution is used (VGA) If Low
the 800x600 resolution (SVGA) is used
Although the resolution determines the area of a monitorrsquos screen
within which an image can be displayed the full extents of the chosen
resolution do not have to be used The actual extents of the image display area
on the screen can be controlled by the use of the DISPSIZE_H and
DISPSIZE_V inputs These inputs determine the total number of pixels to be
used in a line and the total number of lines in a frame respectively
52 Blank pixel generation
The total number of viewable pixels in a line and viewable lines in a
frame is determined by the display resolution chosen through the
RESOLUTION input (1= 640x480 0 = 800x600) and the values received on
the DISPSIZE_H and DISPSIZE_V buses Whether the full extent of the
chosen display resolution is used or not the areas of the monitor screen to the
top bottom left and right of the viewable frame area are blanked by putting
black pixels at the required line-pixel positions This has the effect of centering
the image on the screen
9
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 3
22 The VGA specifications are as follows
256 KB Video RAM (The very first cards could be ordered with 64KB or
128KB of RAM at the cost of losing some video modes)
16-color and 256-color modes
262144-value color palette (six bits each for red green and blue)
Selectable 25175 MHz or 28322 MHz master clock
Maximum of 800 horizontal pixels
Maximum of 600 lines
Refresh rates at up to 70 Hz
Vertical blank interrupt
Packed-pixel mode 256 colors
Hardware smooth scrolling support
The VGA supports both All Points Addressable graphics modes and
alphanumeric text modes Standard graphics modes are
640times480 in 16 colors
640times350 in 16 colors
320times200 in 16 colors
320times200 in 256 colors
3
3 DESCRIPTION ARCHITECTURE
31 VGA display device
VGA display device used for displaying the images taken from the
processor in an exact format It can be used for displaying the output of any
projects Analog graphs etchellip can be directly displayed on the VGA display
monitor
31 VGA (Video Graphics Array) Controller
VGA controller controls the image to be exactly displayed on the display
device It acts as an interface between processing system and display device so
called as interface controller
4
Image or video processing system
VGA Controller
Display
4 CONTROLLER DESIGN
For this first we should know how the display device works Consider an
image below
640x480
Original image
5
Pixels of image
Display device also have pixels Display resolution (640480) is a
standard resolution of display
Pixel
This article is about the picture element For other uses see Pixel
(disambiguation)
This example shows an image with a portion greatly enlarged in which the
individual pixels are rendered as little squares and can easily be seen
6
42 A photograph of sub-pixel display elements on a laptops LCD screen
In digital imaging a pixel (or picture element [1]) is a single point in a raster
image The pixel is the smallest addressable screen element it is the smallest
unit of picture which can be controlled Each Pixel has its address The address
of a pixel corresponds to its coordinate Pixels are normally arranged in a 2-
dimensional grid and are often represented using dots or squares Each pixel is
a sample of an original image where more samples typically provide more-
accurate representations of the original The intensity of each pixel is variable
In color image systems a color is typically represented by three or four
component intensities such as red green and blue or cyan magenta yellow
and black
Color refreshing rate
Color refreshing rate =60Hz
Color refreshing rate represents the number of frames that are transmitted per
second
Minimum number of frames that can be transmitted per second by VGA
controller is 60
7
5 Hardware Description
51 Block Diagram
51 VGA Synchronization Unit
8
The Synchronization Unit provides the horizontal and vertical
synchronization signals ndash HSYNC and VSYNC ndash that are required to correctly
display a picture frame within the confines of a monitorrsquos display area
These synchronization signals are used as control inputs by the
monitorrsquos horizontal and vertical deflection circuits These circuits deflect the
electrons emitted by the three primary color electron guns (Red Green Blue)
left to right and from top to bottom respectively HSYNC provides the start
and stop times for the horizontal deflection circuit so that a line of pixels is
correctly drawn across the screen display VSYNC provides the start and stop
times for the vertical deflection circuit so that the lines of a frame are correctly
drawn from the top to the bottom of the screen display
The resolution for the display is defined by the level on the
RESOLUTION input If High the 640x480 resolution is used (VGA) If Low
the 800x600 resolution (SVGA) is used
Although the resolution determines the area of a monitorrsquos screen
within which an image can be displayed the full extents of the chosen
resolution do not have to be used The actual extents of the image display area
on the screen can be controlled by the use of the DISPSIZE_H and
DISPSIZE_V inputs These inputs determine the total number of pixels to be
used in a line and the total number of lines in a frame respectively
52 Blank pixel generation
The total number of viewable pixels in a line and viewable lines in a
frame is determined by the display resolution chosen through the
RESOLUTION input (1= 640x480 0 = 800x600) and the values received on
the DISPSIZE_H and DISPSIZE_V buses Whether the full extent of the
chosen display resolution is used or not the areas of the monitor screen to the
top bottom left and right of the viewable frame area are blanked by putting
black pixels at the required line-pixel positions This has the effect of centering
the image on the screen
9
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 4
3 DESCRIPTION ARCHITECTURE
31 VGA display device
VGA display device used for displaying the images taken from the
processor in an exact format It can be used for displaying the output of any
projects Analog graphs etchellip can be directly displayed on the VGA display
monitor
31 VGA (Video Graphics Array) Controller
VGA controller controls the image to be exactly displayed on the display
device It acts as an interface between processing system and display device so
called as interface controller
4
Image or video processing system
VGA Controller
Display
4 CONTROLLER DESIGN
For this first we should know how the display device works Consider an
image below
640x480
Original image
5
Pixels of image
Display device also have pixels Display resolution (640480) is a
standard resolution of display
Pixel
This article is about the picture element For other uses see Pixel
(disambiguation)
This example shows an image with a portion greatly enlarged in which the
individual pixels are rendered as little squares and can easily be seen
6
42 A photograph of sub-pixel display elements on a laptops LCD screen
In digital imaging a pixel (or picture element [1]) is a single point in a raster
image The pixel is the smallest addressable screen element it is the smallest
unit of picture which can be controlled Each Pixel has its address The address
of a pixel corresponds to its coordinate Pixels are normally arranged in a 2-
dimensional grid and are often represented using dots or squares Each pixel is
a sample of an original image where more samples typically provide more-
accurate representations of the original The intensity of each pixel is variable
In color image systems a color is typically represented by three or four
component intensities such as red green and blue or cyan magenta yellow
and black
Color refreshing rate
Color refreshing rate =60Hz
Color refreshing rate represents the number of frames that are transmitted per
second
Minimum number of frames that can be transmitted per second by VGA
controller is 60
7
5 Hardware Description
51 Block Diagram
51 VGA Synchronization Unit
8
The Synchronization Unit provides the horizontal and vertical
synchronization signals ndash HSYNC and VSYNC ndash that are required to correctly
display a picture frame within the confines of a monitorrsquos display area
These synchronization signals are used as control inputs by the
monitorrsquos horizontal and vertical deflection circuits These circuits deflect the
electrons emitted by the three primary color electron guns (Red Green Blue)
left to right and from top to bottom respectively HSYNC provides the start
and stop times for the horizontal deflection circuit so that a line of pixels is
correctly drawn across the screen display VSYNC provides the start and stop
times for the vertical deflection circuit so that the lines of a frame are correctly
drawn from the top to the bottom of the screen display
The resolution for the display is defined by the level on the
RESOLUTION input If High the 640x480 resolution is used (VGA) If Low
the 800x600 resolution (SVGA) is used
Although the resolution determines the area of a monitorrsquos screen
within which an image can be displayed the full extents of the chosen
resolution do not have to be used The actual extents of the image display area
on the screen can be controlled by the use of the DISPSIZE_H and
DISPSIZE_V inputs These inputs determine the total number of pixels to be
used in a line and the total number of lines in a frame respectively
52 Blank pixel generation
The total number of viewable pixels in a line and viewable lines in a
frame is determined by the display resolution chosen through the
RESOLUTION input (1= 640x480 0 = 800x600) and the values received on
the DISPSIZE_H and DISPSIZE_V buses Whether the full extent of the
chosen display resolution is used or not the areas of the monitor screen to the
top bottom left and right of the viewable frame area are blanked by putting
black pixels at the required line-pixel positions This has the effect of centering
the image on the screen
9
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 5
4 CONTROLLER DESIGN
For this first we should know how the display device works Consider an
image below
640x480
Original image
5
Pixels of image
Display device also have pixels Display resolution (640480) is a
standard resolution of display
Pixel
This article is about the picture element For other uses see Pixel
(disambiguation)
This example shows an image with a portion greatly enlarged in which the
individual pixels are rendered as little squares and can easily be seen
6
42 A photograph of sub-pixel display elements on a laptops LCD screen
In digital imaging a pixel (or picture element [1]) is a single point in a raster
image The pixel is the smallest addressable screen element it is the smallest
unit of picture which can be controlled Each Pixel has its address The address
of a pixel corresponds to its coordinate Pixels are normally arranged in a 2-
dimensional grid and are often represented using dots or squares Each pixel is
a sample of an original image where more samples typically provide more-
accurate representations of the original The intensity of each pixel is variable
In color image systems a color is typically represented by three or four
component intensities such as red green and blue or cyan magenta yellow
and black
Color refreshing rate
Color refreshing rate =60Hz
Color refreshing rate represents the number of frames that are transmitted per
second
Minimum number of frames that can be transmitted per second by VGA
controller is 60
7
5 Hardware Description
51 Block Diagram
51 VGA Synchronization Unit
8
The Synchronization Unit provides the horizontal and vertical
synchronization signals ndash HSYNC and VSYNC ndash that are required to correctly
display a picture frame within the confines of a monitorrsquos display area
These synchronization signals are used as control inputs by the
monitorrsquos horizontal and vertical deflection circuits These circuits deflect the
electrons emitted by the three primary color electron guns (Red Green Blue)
left to right and from top to bottom respectively HSYNC provides the start
and stop times for the horizontal deflection circuit so that a line of pixels is
correctly drawn across the screen display VSYNC provides the start and stop
times for the vertical deflection circuit so that the lines of a frame are correctly
drawn from the top to the bottom of the screen display
The resolution for the display is defined by the level on the
RESOLUTION input If High the 640x480 resolution is used (VGA) If Low
the 800x600 resolution (SVGA) is used
Although the resolution determines the area of a monitorrsquos screen
within which an image can be displayed the full extents of the chosen
resolution do not have to be used The actual extents of the image display area
on the screen can be controlled by the use of the DISPSIZE_H and
DISPSIZE_V inputs These inputs determine the total number of pixels to be
used in a line and the total number of lines in a frame respectively
52 Blank pixel generation
The total number of viewable pixels in a line and viewable lines in a
frame is determined by the display resolution chosen through the
RESOLUTION input (1= 640x480 0 = 800x600) and the values received on
the DISPSIZE_H and DISPSIZE_V buses Whether the full extent of the
chosen display resolution is used or not the areas of the monitor screen to the
top bottom left and right of the viewable frame area are blanked by putting
black pixels at the required line-pixel positions This has the effect of centering
the image on the screen
9
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 6
Pixels of image
Display device also have pixels Display resolution (640480) is a
standard resolution of display
Pixel
This article is about the picture element For other uses see Pixel
(disambiguation)
This example shows an image with a portion greatly enlarged in which the
individual pixels are rendered as little squares and can easily be seen
6
42 A photograph of sub-pixel display elements on a laptops LCD screen
In digital imaging a pixel (or picture element [1]) is a single point in a raster
image The pixel is the smallest addressable screen element it is the smallest
unit of picture which can be controlled Each Pixel has its address The address
of a pixel corresponds to its coordinate Pixels are normally arranged in a 2-
dimensional grid and are often represented using dots or squares Each pixel is
a sample of an original image where more samples typically provide more-
accurate representations of the original The intensity of each pixel is variable
In color image systems a color is typically represented by three or four
component intensities such as red green and blue or cyan magenta yellow
and black
Color refreshing rate
Color refreshing rate =60Hz
Color refreshing rate represents the number of frames that are transmitted per
second
Minimum number of frames that can be transmitted per second by VGA
controller is 60
7
5 Hardware Description
51 Block Diagram
51 VGA Synchronization Unit
8
The Synchronization Unit provides the horizontal and vertical
synchronization signals ndash HSYNC and VSYNC ndash that are required to correctly
display a picture frame within the confines of a monitorrsquos display area
These synchronization signals are used as control inputs by the
monitorrsquos horizontal and vertical deflection circuits These circuits deflect the
electrons emitted by the three primary color electron guns (Red Green Blue)
left to right and from top to bottom respectively HSYNC provides the start
and stop times for the horizontal deflection circuit so that a line of pixels is
correctly drawn across the screen display VSYNC provides the start and stop
times for the vertical deflection circuit so that the lines of a frame are correctly
drawn from the top to the bottom of the screen display
The resolution for the display is defined by the level on the
RESOLUTION input If High the 640x480 resolution is used (VGA) If Low
the 800x600 resolution (SVGA) is used
Although the resolution determines the area of a monitorrsquos screen
within which an image can be displayed the full extents of the chosen
resolution do not have to be used The actual extents of the image display area
on the screen can be controlled by the use of the DISPSIZE_H and
DISPSIZE_V inputs These inputs determine the total number of pixels to be
used in a line and the total number of lines in a frame respectively
52 Blank pixel generation
The total number of viewable pixels in a line and viewable lines in a
frame is determined by the display resolution chosen through the
RESOLUTION input (1= 640x480 0 = 800x600) and the values received on
the DISPSIZE_H and DISPSIZE_V buses Whether the full extent of the
chosen display resolution is used or not the areas of the monitor screen to the
top bottom left and right of the viewable frame area are blanked by putting
black pixels at the required line-pixel positions This has the effect of centering
the image on the screen
9
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 7
42 A photograph of sub-pixel display elements on a laptops LCD screen
In digital imaging a pixel (or picture element [1]) is a single point in a raster
image The pixel is the smallest addressable screen element it is the smallest
unit of picture which can be controlled Each Pixel has its address The address
of a pixel corresponds to its coordinate Pixels are normally arranged in a 2-
dimensional grid and are often represented using dots or squares Each pixel is
a sample of an original image where more samples typically provide more-
accurate representations of the original The intensity of each pixel is variable
In color image systems a color is typically represented by three or four
component intensities such as red green and blue or cyan magenta yellow
and black
Color refreshing rate
Color refreshing rate =60Hz
Color refreshing rate represents the number of frames that are transmitted per
second
Minimum number of frames that can be transmitted per second by VGA
controller is 60
7
5 Hardware Description
51 Block Diagram
51 VGA Synchronization Unit
8
The Synchronization Unit provides the horizontal and vertical
synchronization signals ndash HSYNC and VSYNC ndash that are required to correctly
display a picture frame within the confines of a monitorrsquos display area
These synchronization signals are used as control inputs by the
monitorrsquos horizontal and vertical deflection circuits These circuits deflect the
electrons emitted by the three primary color electron guns (Red Green Blue)
left to right and from top to bottom respectively HSYNC provides the start
and stop times for the horizontal deflection circuit so that a line of pixels is
correctly drawn across the screen display VSYNC provides the start and stop
times for the vertical deflection circuit so that the lines of a frame are correctly
drawn from the top to the bottom of the screen display
The resolution for the display is defined by the level on the
RESOLUTION input If High the 640x480 resolution is used (VGA) If Low
the 800x600 resolution (SVGA) is used
Although the resolution determines the area of a monitorrsquos screen
within which an image can be displayed the full extents of the chosen
resolution do not have to be used The actual extents of the image display area
on the screen can be controlled by the use of the DISPSIZE_H and
DISPSIZE_V inputs These inputs determine the total number of pixels to be
used in a line and the total number of lines in a frame respectively
52 Blank pixel generation
The total number of viewable pixels in a line and viewable lines in a
frame is determined by the display resolution chosen through the
RESOLUTION input (1= 640x480 0 = 800x600) and the values received on
the DISPSIZE_H and DISPSIZE_V buses Whether the full extent of the
chosen display resolution is used or not the areas of the monitor screen to the
top bottom left and right of the viewable frame area are blanked by putting
black pixels at the required line-pixel positions This has the effect of centering
the image on the screen
9
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 8
5 Hardware Description
51 Block Diagram
51 VGA Synchronization Unit
8
The Synchronization Unit provides the horizontal and vertical
synchronization signals ndash HSYNC and VSYNC ndash that are required to correctly
display a picture frame within the confines of a monitorrsquos display area
These synchronization signals are used as control inputs by the
monitorrsquos horizontal and vertical deflection circuits These circuits deflect the
electrons emitted by the three primary color electron guns (Red Green Blue)
left to right and from top to bottom respectively HSYNC provides the start
and stop times for the horizontal deflection circuit so that a line of pixels is
correctly drawn across the screen display VSYNC provides the start and stop
times for the vertical deflection circuit so that the lines of a frame are correctly
drawn from the top to the bottom of the screen display
The resolution for the display is defined by the level on the
RESOLUTION input If High the 640x480 resolution is used (VGA) If Low
the 800x600 resolution (SVGA) is used
Although the resolution determines the area of a monitorrsquos screen
within which an image can be displayed the full extents of the chosen
resolution do not have to be used The actual extents of the image display area
on the screen can be controlled by the use of the DISPSIZE_H and
DISPSIZE_V inputs These inputs determine the total number of pixels to be
used in a line and the total number of lines in a frame respectively
52 Blank pixel generation
The total number of viewable pixels in a line and viewable lines in a
frame is determined by the display resolution chosen through the
RESOLUTION input (1= 640x480 0 = 800x600) and the values received on
the DISPSIZE_H and DISPSIZE_V buses Whether the full extent of the
chosen display resolution is used or not the areas of the monitor screen to the
top bottom left and right of the viewable frame area are blanked by putting
black pixels at the required line-pixel positions This has the effect of centering
the image on the screen
9
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 9
The Synchronization Unit provides the horizontal and vertical
synchronization signals ndash HSYNC and VSYNC ndash that are required to correctly
display a picture frame within the confines of a monitorrsquos display area
These synchronization signals are used as control inputs by the
monitorrsquos horizontal and vertical deflection circuits These circuits deflect the
electrons emitted by the three primary color electron guns (Red Green Blue)
left to right and from top to bottom respectively HSYNC provides the start
and stop times for the horizontal deflection circuit so that a line of pixels is
correctly drawn across the screen display VSYNC provides the start and stop
times for the vertical deflection circuit so that the lines of a frame are correctly
drawn from the top to the bottom of the screen display
The resolution for the display is defined by the level on the
RESOLUTION input If High the 640x480 resolution is used (VGA) If Low
the 800x600 resolution (SVGA) is used
Although the resolution determines the area of a monitorrsquos screen
within which an image can be displayed the full extents of the chosen
resolution do not have to be used The actual extents of the image display area
on the screen can be controlled by the use of the DISPSIZE_H and
DISPSIZE_V inputs These inputs determine the total number of pixels to be
used in a line and the total number of lines in a frame respectively
52 Blank pixel generation
The total number of viewable pixels in a line and viewable lines in a
frame is determined by the display resolution chosen through the
RESOLUTION input (1= 640x480 0 = 800x600) and the values received on
the DISPSIZE_H and DISPSIZE_V buses Whether the full extent of the
chosen display resolution is used or not the areas of the monitor screen to the
top bottom left and right of the viewable frame area are blanked by putting
black pixels at the required line-pixel positions This has the effect of centering
the image on the screen
9
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 10
The color generated for a pixel in the Pixel Unit depends on whether
the particular pixel requires to be blanked or not The Synchronization Unit
provides a signal to the Pixel Unit for this very reason This is the line display
enable signal - en The signal is checked on each rising edge of the external
clock signal (CLK) and is set as follows
If (HCOUNT ge PixTotal) or (VCOUNT ge LinTotal) then
en = 0 (pixel requires to be blanked ndash set color to be black)
Else
en = 1 (pixel is a viewable pixel ndash generate RGB color accordingly)
6 SIGNALS IN VGA CONTROLLER
For most common VGA mode 640x480 60 Hz non-interlaced the horizontal
timings are
Parameter Value Unit
Clock frequency 25175 MHz
Horizontal pixels 640
Horizontal sync polarity Negative
Total time for each line 3177 micros
Front porch (A) 094 micros
Sync pulse length (B) 377 micros
Back porch (C) 189 micros
Active video (D) 2517 micros
(Total horizontal sync time 660 micros)
10
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 11
61 The vertical timings are
Parameter Value Unit
Vertical lines 480
Vertical sync polarity Negative
Vertical frequency 5994 Hz
Front porch (E) 035 ms
Sync pulse length (F) 006 ms
Back porch (G) 102 ms
Active video (H) 1525 ms
(Total vertical sync time 143 ms)
640 x 400 70 Hz is video mode used for booting most x86 personal
computers
640 x 480 60 Hz is the default MS-Windows graphics mode with 16 colors
It should be noted that the actual timings vary slightly For example for
640x480 60fps a 2517 micros active video time with a pixel frequency of
25174 MHz gives 633 pixels rather than the expected 640 pixels
62 Generation of the horizontal synchronization signal ndash HSYNC
11
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 12
The HSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated on each rising
edge of the external clock signal (CLK)
The state of the HSYNC signal depends on the value stored in the horizontal
counter and is driven low when
HCOUNT ge (PixTotal + BlankingLeft)
and remains low while
HCOUNT lt (PixTotal + BlankingLeft + TLSync)
Vertical (Frame) Period
63 Generation of the vertical synchronization signal - VSYNC
The VSYNC signal is High (inactive) after an external reset signal
(RST) is received by the VGA Controller The signal is updated after every
line of pixels is completed (ie on each rising edge of the HSYNC signal)
The state of the VSYNC signal depends on the value stored in the vertical
counter and is driven low when
VCOUNT ge (LinTotal + BlankingLeft)
and remains low while
VCOUNT lt (LinTotal + BlankingLeft + TFSync)
Address Counter (MEMADDR)
The address counter is used to store the position of the next consecutive
pixel in the frame Its value is passed to the Pixel Unit on the internal bus
signal hvcnt which is then used to provide the ADDR_PIXEL signal to obtain
the next pixel from picture memory
The counter is reset to zero when the VGA Controller receives an external
reset signal (RST) The size of the counter depends on the values chosen for
DISPSIZE_H and DISPSIZE_V as the range is simply
12
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 13
0 to (PixTotal x LinTotal) - 1
Taking the maximum number of pixels in a line and lines in a frame for each
of the supported display resolutions the maximum ranges for the counter are
bull 640x480 0 to 307199
bull 800x600 0 to 479999
The counter has 19-bit resolution
While the value in the horizontal counter (HCOUNT) is less than the
total number of viewable pixels in a line (PixTotal the integer value of
DISPSIZE_H) the counter is incremented on the rising edge of the external
clock signal (CLK) Pixel addressing within the frame is consecutive When
the counter reaches the last pixel in a line its incremented value is the first
pixel in the next line down
The address counter will continue to be incremented until the value in the
vertical counter (VCOUNT) is greater than or equal to the total number of
viewable lines in a frame (LinTotal the integer value of DISPSIZE_V) At this
point it will be rolled over to zero
13
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 14
7 VGA Controller The VGA Controller provides a simple 8-bit interface between a host
microcontroller and any VGA-compatible monitor This variant of the
Controller provides six modes of display depending on the resolution chosen
(640x480 (VGA) or 800x600 (SVGA)) and the color palette (either Black amp
White 16 Colors or 64 Colors)
71 VGA Controller Tasks-1) To display the frames at minimum 60hz refreshing rate
For 1sec = 60 frames
1 frame = max 160 sec
1 pixel = 160 sec = 50ns
307200
Max Time for 1 pixel = 40ns
2) To send information for each and every pixel of a frame by means of
RGB signals
3) After completely displaying 1 row information the column must be
refreshed and the row has to be incremented ie done by Hsync signal
4) After completing 1 frame of information a new frame has to be started
for this row and column are to be refreshed ie done by Vsync signal
14
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 15
=gt For giving information for 1 pixel we use 1 clock (for controlling all
operations)
1 clock = 1 pixel
1 clock cycle period = 40ns
Clock frequency = 1time period =140ns = 25 mhz
1 clock frequency = 25 mhz
So if we use 25mhz clock and display 1 pixel information for each and enery
clock then we use can attain min 60hz refreshing rate
do not use the clock frequency less than 25mhz
Ex
15MHz 25MHz
30MHz
Time period = more effectively utilized frequency
Less
No of framessec = less
More
Decreases
Increases
(It is unable to display 60framessec)
Signals used for image display-
R (Red)
15
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 16
G (Green)
B (Blue)
HSYNC (Horizontal Synchronous signal)
VSYNC (Vertical Synchronous signal)
All the above signals are used to control the image data to be correctly
displayed on the VGA display monitor
72 Timings for front port sync pulse back port-
Horizontal sync Vertical sync
Front port 094micros Front port 035ms
Sync pulse 377 micros Sync pulse 0063ms
Back port 189 micros Back port 102ms
16
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 17
8 FUNCTIONAL DESCRIPTION81 Symbol
VGA Controller Symbol
Pin description
The pin description is shown in the below table
Table VGA Pin description
Name Type Polarity Bus size Description
Global Control Signals
CLK I Rising Global system clock This clock determines the maximum rate at which pixels can be sent to the monitor The frequency of the clock signal determines the refresh rate as follows
17
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 18
640x480 CLK = 25MHz Refresh = 60Hz CLK = 30MHz Refresh = 72Hz 800x600 CLK = 40MHz Refresh = 60Hz CLK = 50MHz Refresh = 72Hz
RST I High Global system reset
VGA Input Settings Signals
RESOLUTION I High Low This input selects the screen resolution to be used 1 = 640x480 (VGA) 0 = 800x600 (SVGA)
CMOD I 2 This input selects the color palette to be used 00 = Black amp White 01 = 16 Colors 10 = 64 Colors
DISPSIZE_H I 10 This input determines the number of viewable pixels to be displayed in each line of a frame and is therefore used to control the horizontal extents of the visible display area
DISPSIZE_V I 10 This input determines the number of lines to be displayed in a frame and is therefore used to control the vertical extents of the visible display area
Data Memory Control Signals
RD O High This is the enable signal when data is required to be read from the memory space This signal is controlled by and follows
18
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 19
the internal line enable signal en generated by the Synchronization Unit of the Controller
DATA I 8 Data input from picture memory space Data is stored in memory in bytes the content of which depends on the chosen color palette as follows Black amp White mode 8 1-bit pixels 16 Colors mode 2 4-bit pixels 64 Colors mode 1 6-bit pixel (DATA[50]) Note In 64 Colors mode bits 7 and 6 of each byte are not used
ADDR_PIXEL O 19 Specifies the address of the next pixel in picture memory Addresses are consecutive ndash once the end of the current line has been reached the next address is that of the pixel at the start of the next line down in the frame
VGA Monitor Control Signals
HSYNC O Falling Horizontal synchronization signal This signal is used to control the horizontal deflection circuit in the
VGA monitor so that the start and end of a line of pixels is correctly displayed across the visible display area of the screen The horizontal size of the display area is controlled by the DISPSIZE_H input to the Controller
19
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 20
VSYNC O Falling Vertical synchronization signal This signal is used to control the vertical deflection circuit in the VGA monitor so that the start and end of a frame (of lines) is correctly displayed between the top and bottom edges of the visible display area of the screen The vertical size of the display area is controlled by the DISPSIZE_V input to the Controller
R1 O High Low Provides the 2-bit digital signal for the intensity of red used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
G1 O High Low Provides the 2-bit digital signal for the intensity of green used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
S
B1 O High Low Provides the 2-bit digital signal for the intensity of blue used in composing a pixels displayed color These two signals are inputs to a simple 2-bit DAC (external to the Controller) that provides the analog signal required by the VGA monitor
20
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 21
9 VGA SIGNAL TIMING
Figure 3 summarizes the signal timing involved in sending a line of
pixels and a frame of lines The actual time values differ according to the
resolution selected (640x480 or 800x600) the processor-defined values for
DISPSIZE_H and DISPSIZE_V and the frequency of the external clock signal
(CLK)
21
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 22
91 VGA PIXEL UNIT
The Pixel Unit provides access to the pixilated image stored in external
picture memory reading in data a byte at a time and formatting each pixel to
be displayed For each pixel the 6-bit RGB value required for the control of
the monitorrsquos three primary color electron guns is generated so that the pixel
is displayed on the screen with the correct color
Accessing the stored image
The image to be displayed on the monitor screen is written by the host
microcontroller into external memory space (RAM) This memory space can
be located anywhere (eg using a block of RAM within the FPGA design or
using the SRAM on the NanoBoard itself)
Pixel data is stored in the memory space in bytes The number of pixels in a
byte depends on which color palette is being used for the image (selected by
the CMOD input)
bull in Black amp White mode ndash one byte of data in RAM contains 8 1-bit pixels
bull in 16 Colors mode ndash one byte of data in RAM contains 2 4-bit pixels
bull in 64 Colors mode ndash one byte of data in RAM contains 1 6-bit pixel
(DATA[76] are not used)
22
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 23
The size of memory required to store a picture is determined by the total
number of viewable pixels in a line (determined by DISPSIZE_H) the total
number of viewable lines in a frame (determined by DISPSIZE_V) and the
number of pixels stored in each byte in memory space
Memory required for picture = (PixTotal x LinTotal) number of pixels per byte
The address in RAM where the next pixel is stored is determined using
an internal signal provided by the Synchronization Unit ndash hvcnt ndash which
reflects the current contents of the MEMADDR register The exact addressing
is described below
92 Black amp White mode
The picture memory address ndash the byte of data containing the next 8 pixels
ndash is determined by using bits 183 of hvcnt and right shifting the contents by
three
ADDR_PIXEL = 00 amp hvcnt[183]
16 Colors mode
The picture memory address ndash the byte of data containing the next 2 pixels
ndash is determined by using bits 181 of hvcnt and right shifting the contents by
one
ADDR_PIXEL = 0 amp hvcnt[181]
64 Colors mode
The picture memory address ndash the byte of data containing the next pixel ndash is
determined by using the full value of hvcnt
ADDR_PIXEL = hvcnt[180]
The Pixel register (PIXREG)
The Pixel register is used to receive the byte of pixel data read from the
current address in memory space The register PIXREG is reset to zero when
the VGA Controller receives an external reset signal (RST)
23
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 24
The register is updated on each rising edge of the CLK signal Data can be
read from the memory space as long as the RD signal is active (High) The RD
signal is itself controlled by the external line display enable signal enex This
internally generated signal is defined as follows
If ((HCOUNT gt (PixTotal+1)) and (HCOUNT lt Line Period)) or ((VCOUNT gt
(LinTotal+1)) and (VCOUNT lt Frame Period)) then enex = 0
Else
enex = 1
When enex is Low read access from memory is disabled (RD = 0)
The point at which data is loaded from memory into PIXREG depends on the
particular color palette that is chosen ndash Black amp White 16 Colors or 64
Colors
Black amp White mode
The next byte of data will be loaded into the register whenever the
lowest three bits of the hvcnt signal ndash received from the Synchronization Unit -
are ldquo000rdquo
For the currently loaded byte the active pixel is always in the lowest bit
position of the Pixel register Each pixel in the data byte is moved into this
active pixel position by shifting the contents of the register right by one bit on
each rising edge of CLK
16 Colors mode
The next byte of data will be loaded into the register whenever the
lowest bit of the hvcnt signal ndash received from the Synchronization Unit - is a
0
For the currently loaded byte the active pixel is always in the low order
nibble of the Pixel register Remember that in this mode each byte of data
contains two pixels The second pixel is moved into this active pixel position
by shifting the contents of the register right by four bits on the rising edge of
CLK
24
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 25
64 Colors mode
The next byte of data will be loaded into the register on the rising edge
of the external system clock signal (CLK) In this mode the read of pixel data
does not depend on the status of the hvcnt signal received from the
Synchronization Unit
93 The RGB register (RGBREG)
The RGB register is used to store the six bits that are required for
driving the red green and blue color guns of the target monitor When the
chosen color palette is either Black amp White or 16 Colors these six bits are
obtained by mapping the value of the active pixel to a predefined RGB code
When the chosen color palette is 64 Colors the actual pixel value is used
directly
The register RGBREG is reset to zero (000000) when the VGA
Controller receives an external reset signal (RST) This RGB code represents
black
The register is updated on each rising edge of the CLK signal and the
value loaded is dependent on the state of the line display enable signal en
When en is Low blanking is required and RGBREG is loaded with the code
for black (000000)
Table 2 illustrates the mapping of a 1-bit pixel (Black amp White mode)
and a 4-bit pixel (16 Colors mode) into the required RGB color code
25
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 26
26
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 27
27
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 28
The RGB color code stored in the RGB register is output from the VGA
Controller as separate 2-bit R G and B values (outputs R0 R1 G0 G1 B0
and B1)
The monitor itself expects analog signals as inputs to its electron gun control
circuits This is achieved by using 2-bit digital to analog converter circuitry
located on the Nano Board itself as shown in Figure 4
Figure 91 Digital to Analog RGB conversion
28
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 29
For each color the 2-bit digital signal from the VGA Controller can be
converted into 4 distinct analog levels These levels specify the intensity of
each of the three primary colors to use when displaying the pixel on the
monitorrsquos screen The levels range from 0V (total darkness) to 07V (maximum
brightness)
With each analog input being one of four possible levels the monitor
can display each pixel on the screen with one of 64 different color
permutations
Using Block RAM in Spartan-3E Generation
FPGAs
For applications requiring large on-chip memories Spartantrade-3
Generation FPGAs provides plentiful efficient Select RAMtrade memory blocks
Using various configuration options Select RAM blocks create RAM ROM
FIFOs large look-up tables data width converterscircular buffers and shift
registers each supporting various data widths and depths This application note
describes the features and capabilities of block Select RAM and illustrates how
to specify the various options using the Xilinx CORE Generatortrade system or
via VHDL or Verilog instantiation
29
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 30
Table 2 Block RAM Available in Spartan-3E Devices
Devi
ce
R
A
M
Co
lu
m
ns
R
A
M
Bl
oc
ks
pe
r
C
ol
u
m
n
T
o
t
a
l
R
A
M
B
l
o
c
k
s
T
o
t
a
l
R
A
M
B
it
s
T
o
t
a
l
R
A
M
K
b
i
t
s
XC3
S100
E
1 4 4 7
3
7
2
8
7
2
XC3
S250
E
2 6 1
2
2
2
1
1
8
4
2
1
6
XC3
S500
E
2 10 2
0
3
6
8
6
4
3
6
0
30
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 31
0
XC3
S120
0E
2 14 2
8
5
1
6
0
9
6
5
0
4
XC3
S160
0E
2 18 3
6
6
6
3
5
5
2
6
4
8
Each block RAM contains 18432 bits of fast static RAM 16K bits of
which is allocated to data storage and in some memory configurations an
additional 2K bits allocated to parity or additional plus data bits Physically
the block RAM memory has two completely independent access ports labeled
Port A and Port B The structure is fully symmetrical and both ports are
interchangeable and both ports support data read and write operations Each
memory port is synchronous with its own clock clock enable and write
enable Read operations are also synchronous and require a clock edge and
clock enable Though physically a dual-port memory block RAM simulates
single-port memory in an application as shown in Figure 1 Furthermore each
block memory supports multiple configurations or aspect ratios Table 3
summarizes the essential SelectRAM features Cascade multiple block RAMs
to create deeper and wider memory organizations with a minimal timing
penalty incurred through specialized routing resources
31
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 32
32
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 33
10HARDWARE DESCRIPTIVE LANGUAGE (VHDL)
Why (V) HDL
Interoperability
Technology independence
Design reuse
Several levels of abstraction
Readability
Standard language
Widely supported
What is VHDL
VHDL = VHSIC Hardware Description Language(VHSIC = Very High-Speed
IC)
Design specification language
Design entry language
Design simulation language
Design documentation language
An alternative to schematics
101 Brief History
VHDL Was developed in the early 1980s for managing design problems that
involved large circuits and multiple teams of engineers
Funded by US Department of Defence
33
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 34
The first publicly available version was released in 1985
In 1986 IEEE (Institute of Electrical and Electronics Engineers Inc) was
presented with a proposal to standardize the VHDL
In 1987 standardization =gt IEEE 1076-1987
An improved version of the language was released in 1994 =gt IEEE
standard1076-1993
102 Related Standards
IEEE 1076 doesnrsquot support simulation conditions such as unknown and high-
impedance
Soon after IEEE 1076-1987 was released simulator companies began using
their own non-standard types =gt VHDL was becoming a nonstandard
IEEE 1164 standard was developed by an IEEE1048715IEEE 1164 contains
definitions for a nine-valued data type std_logic
IEEE 10763 (Numeric or Synthesis Standard) defines data types as they relate
to actual hardware
Defines eg two numeric types signed and unsigned
VHDL Environment
Design Units
34
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 35
Segments of VHDL code that can be compiled separately and stored in a
library
Entities
A black box with interface definition
Defines the inputsoutputs of a component (define pins)
A way to represent modularity in VHDL
Similar to symbol in schematic
Entity declaration describes entity
Eg
entity Comparator is
port (A B in std_logic_vector(7 downto0)
EQ out std_logic)
end Comparator
Ports Provide channels of communication between the component and its
environment
Each port must have a name direction and a type
An entity may have NO port declaration
35
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 36
Port directions In A value of a port can be read inside the component but cannot be assigned
Multiple reads of port are allowed
Out Assignments can be made to a port but data from a port cannot be read
Multiple assignments are allowed
In out Bi-directional assignments can be made and data can be read Multiple
assignments are allowed
Buffer An out port with read capability May have at most one assignment
(are not recommended)
Architectures Every entity has at least one architecture
One entity can have several architectures
Architectures can describe design using
BehaviorndashStructurendashDataflow
Architectures can describe design on many levelsndashGate levelndashRTL (Register
Transfer Level)ndashBehavioral level
Configuration declaration links architecture to entity
Eg
Architecture Comparator1 of Comparator is
Begin
EQ lt= rsquo1rsquowhen (A=B) else rsquo0rsquo
End Comparator1
Configurations Links entity declaration and architecture body together
Concept of default configuration is a bit messy in VHDL lsquo87
ndashLast architecture analyzed links to entity
Can be used to change simulation behavior without re-analyzing the VHDL
source
Complex configuration declarations are ignored in synthesis
36
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 37
Some entities can have eggate level architecture and behavioral
architecture
Are always optional
PackagesPackages contain information common to many design units
1 Package declaration
--constant declarations
ndashtype and subtype declarations
ndashfunction and procedure declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
2 Package body
ndashis not necessary needed
ndashfunction bodies
ndashprocedure bodies
Packages are meant for encapsuling data which can be shared globally among
Several design units
Consists of declaration part and optional body part
Package declaration can contain
ndashtype and subtype declarations
ndashsubprograms
ndashconstants
Alias declarations
ndashglobal signal declarations
ndashfile declarations
ndashcomponent declarations
37
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 38
Package body consists of
ndashsubprogram declarations and bodies
ndashtype and subtype declarations
ndash deferred constants
ndash file declarations
LibrariesCollection of VHDL design units (database)
1 Packages
package declaration
package body
2 Entities (entity declaration)
3 Architectures (architecture body)
4 Configurations (configuration declarations)
Usually directory in UNIX file system
Can be also any other kind of database
Levels of Abstraction
VHDL supports many possible styles of design description which differ
primarily in how closely they relate to the HW
It is possible to describe a circuit in a number of ways
Structural-------
Dataflow ------- Higher level of abstraction
Behavioral -------
Structural VHDL description
Circuit is described in terms of its components
From a low-level description (eg transistor-level description) to a high level
description (eg block diagram)
For large circuits a low-level description quickly becomes impractical
103 Dataflow VHDL Description
38
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 39
Circuit is described in terms of how data moves through the system
In the dataflow style you describe how information flows between registers in
the system
The combinational logic is described at a relatively high level the placement
and operation of registers is specified quite precisely
The behavior of the system over the time is defined by registers
There are no build-in registers in VHDL-language
ndashEither lower level description
ndashor behavioral description of sequential elements is needed
The lower level register descriptions must be created or obtained
If there is no 3rd party models for registers =gt you must write the behavioral
description of registers
The behavioral description can be provided in the form of
subprograms(functions or procedures)
104 Behavioral VHDL Description
Circuit is described in terms of its operation over time
39
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 40
Representation might include eg state diagrams timing diagrams and
algorithmic descriptions
The concept of time may be expressed precisely using delays (eg A lt= B
after 10 ns)
If no actual delay is used order of sequential operations is defined
In the lower levels of abstraction (eg RTL) synthesis tools ignore detailed
timing specifications
The actual timing results depend on implementation technology and efficiency
of synthesis tool
There are a few tools for behavioral synthesis
Concurrent Vs Sequential
Processes
Basic simulation concept in VHDL
VHDL description can always be broken up to interconnected processes
Quite similar to Unix process
40
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 41
Process keyword in VHDL
Process statement is concurrent statement
Statements inside process statements are sequential statements
Process must contain either sensitivity list or wait statement(s) but NOT both
Sensitivity list or wait statement(s) contains signals which wakes process up
General format
Process [(sensitivity_list)]
process_declarative_part
Begin
process_statements
[wait_statement]
End process
41
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 42
11 VGA CONTROLLER CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
use IEEEnumeric_stdALL
---- Uncomment the following library declaration if instantiating
---- any Xilinx primitives in this code
--library UNISIM
--use UNISIMVComponentsall
entity vga_controller is port(clk50min std_logic
rstnin std_logic
hsyncout std_logic
vsyncout std_logic
redout std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0))
end vga_controller
architecture Behavioral of vga_controller is
component memory
port (
addr IN std_logic_VECTOR(17 downto 0)
clk IN std_logic
dout OUT std_logic_VECTOR(0 downto 0)
en IN std_logic
sinit IN std_logic)
END component
42
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 43
component sync_unit port(
clkin std_logic
rstnin std_logic
hsyncout std_logic
---horizontal synch pulse(row)
vsyncout std_logic
--vertical sync pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countinout integer
c_hsinout integer)
end component
component pixelgen port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic)
end component
43
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 44
signal clkhsync_svsync_sh_en_sv_en_sv_en_reg_srd_sstd_logic
signal line_count_sc_hs_saddrinteger
signal data_sstd_logic_vector(0 downto 0)
signal addr_sstd_logic_vector(17 downto 0)
begin
--addr_s lt= std_logic_vector(to_unsigned(addr18))
process(clk50mrstn)
begin
if(rstn = 0)then
clk lt= 0
elsif(clk50m = 1 and clk50mevent)then
clk lt= not clk
end if
--end if
end process
hsync lt= hsync_s
vsync lt= vsync_s
sync sync_unit port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
v_en_reg =gt v_en_reg_s
line_count =gt line_count_s
h_en =gt h_en_s
v_en =gt v_en_s
c_hs =gt c_hs_s
)
44
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 45
pixel pixelgen port map(
clk =gt clk
rstn =gt rstn
hsync =gt hsync_s
vsync =gt vsync_s
h_en =gt h_en_s
v_en =gt v_en_s
v_en_reg =gt v_en_reg_s
data_in =gt data_s
line_count=gt line_count_s
c_hs =gt c_hs_s
red =gt red
green =gt green
blue =gt blue
addr =gt addr_s
rd =gt rd_s
)
mem12 memory port map(
addr =gt addr_s
clk =gt clk
dout =gt data_s
en =gt rd_s
sinit=gt rstn
)
end Behavioral
45
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 46
SYNCHRONIZATION BLOCK CODE
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity sync_unit is port(
clkin std_logic
rstnin std_logic
hsyncout std_logic ---horizontal synch pulse(row)
vsyncout std_logic --vertical sync
pulse(frame) 111110100000000000
v_en_regout std_logic
h_enout std_logic
v_enout std_logic
line_countout integer
c_hsout integer )
end sync_unit
architecture arch_sync_unit of sync_unit is
signal h_en_sv_en_sstd_logic
signal line_count_sc_hs_sinteger
begin
c_hs lt= c_hs_s
line_count lt= line_count_s
h_en lt= h_en_s
v_en lt= v_en_s
-- hsync counter
process(clkrstn)
begin
if(rstn = 0)then
46
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 47
c_hs_s lt= 0
else
if(clk = 1 and clkevent)then
if(c_hs_s=793)then
c_hs_s lt= 0
else
c_hs_s lt= c_hs_s +1
end if
end if
end if
end process
--vsync counter
--vertical line counter
process(h_en_srstn)
begin
if(rstn = 0)then
line_count_s lt= 0
elsif(h_en_s = 1 and h_en_sevent)then
line_count_s lt= line_count_s + 1
if (line_count_s=515)then
line_count_s lt= 0
end if
end if
--end if
end process
--hysnc pulse
process(clkrstn)
begin
if(rstn = 0)then
47
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 48
hsync lt= 1
elsif(clk = 1 and clkevent)then
if(c_hs_s lt=95)then
hsync lt= 0
else
hsync lt= 1
end if
end if
--end if
end process
process(clkrstn) ----Horizontal Enable ( Row Information Will be Sent in
this period)
begin
if(rstn = 0)then
h_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((c_hs_s gt=143) and (c_hs_slt=778))then
h_en_s lt= 1
else
h_en_s lt= 0
end if
end if
--end if
end process
---vysnc pulse
process(clkrstn) -- Vertical Enable (Data is Valid in this Region)
begin
if(rstn = 0)then
48
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 49
v_en_s lt= 0
elsif(clk = 1 and clkevent)then
if((line_count_sgt=35 and line_count_slt=515))then
v_en_s lt= 1
else
v_en_s lt= 0
end if
end if
--end if
end process
process(clkrstn) --Vertical Sync Front porch Back Porch
begin
if(rstn = 0)then
vsync lt= 1
elsif(clk = 1 and clkevent)then
if(line_count_s lt2)then
vsync lt= 0
else
vsync lt= 1
end if
end if
--end if
end process
process(clk)
begin
if(clk = 1 and clkevent)then
v_en_reglt=v_en_s
end if
end process
end arch_sync_unit
49
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 50
PIXEL GENERATOR
library IEEE
use IEEESTD_LOGIC_1164ALL
use IEEESTD_LOGIC_ARITHALL
use IEEESTD_LOGIC_UNSIGNEDALL
entity pixelgen is port(
clkin std_logic
rstnin std_logic
hsyncin std_logic
vsyncin std_logic
h_enin std_logic
v_enin std_logic
v_en_reg in std_logic
data_inin std_logic_vector(0 downto 0)
line_countin integer
c_hsin integer
red out std_logic_vector(0 downto 0)
greenout std_logic_vector(0 downto 0)
blueout std_logic_vector(0 downto 0)
addrout std_logic_vector(17 downto 0)
rdout std_logic
)
end pixelgen
architecture behv of pixelgen is
signal addr_sstd_logic_vector(17 downto 0)
begin
addr lt= addr_s
process(clkrstn)
begin
50
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 51
if(rstn = 0)then
rd lt= 0
red lt= 0
green lt= 0
blue lt= 0
addr_s lt= x0000 amp 00
elsif (clk = 1 and clkevent)then
if((v_en=1) and (v_en_reg=0))then
addr_slt=(others =gt0)
end if
if(v_en=1 and h_en=1)then
if(line_countlt=514)then
if(c_hs lt= 655)then
rd lt= 1
red lt= data_in
green lt= data_in
blue lt= data_in
addr_s lt=addr_s +000000000000000001
else
rd lt= 0
red lt= 0
green lt= 1
blue lt= 0
end if
else
rd lt= 0
red lt= 1
green lt= 1
blue lt= 1
51
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 52
end if
if(addr_s=111011111111111111)then
addr_slt= (others =gt0)
end if
else
red lt= 0
green lt= 0
blue lt= 0
end if
end if
-- end if
--end if
end process
end behv
52
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 53
12 FPGA ARCHITECTURE
A field-programmable gate array (FPGA) is a semiconductor device that can
be configured by the customer or designer after manufacturingmdashhence the
name field-programmable FPGAs are programmed using a logic circuit
diagram or a source code in a hardware description language (HDL) to specify
how the chip will work They can be used to implement any logical function
that an application-specific integrated circuit (ASIC) could perform but the
ability to update the functionality after shipping offers advantages for many
applications
FPGAs contain programmable logic components called logic blocks and a
hierarchy of reconfigurable interconnects that allow the blocks to be wired
togethermdashsomewhat like a one-chip programmable breadboard Logic blocks
can be configured to perform complex combinational functions or merely
simple logic gates like AND and XOR In most FPGAs the logic blocks also
include memory elements which may be simple flip-flops or more complete
blocks of memory
History The FPGA industry sprouted from programmable read only
memory (PROM) and programmable logic devices (PLDs) PROMs and PLDs
both had the option of being programmed in batches in a factory or in the field
(field programmable) however programmable logic was hard-wired between
logic gates
Xilinx Co-Founders Ross Freeman and Bernard Vonderschmitt invented the
first commercially viable field programmable gate array in 1985 ndash the
XC2064 The XC2064 had programmable gates and programmable
interconnects between gates the beginnings of a new technology and market
The XC2064 boasted a mere 64 configurable logic blocks (CLBs) with two 3-
input lookup tables (LUTs) More than 20 years later Freeman was entered
into the National Inventors Hall of Fame for his invention
53
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 54
Some of the industryrsquos foundational concepts and technologies for
programmable logic arrays gates and logic blocks are founded in patents
awarded to David W Page and LuVerne R Peterson in 1985
In the late 1980s the Naval Surface Warfare Department funded an experiment
proposed by Steve Casselman to develop a computer that would implement
600000 reprogrammable gates Casselman was successful and the system was
awarded a patent in 1992
Xilinx continued unchallenged and quickly growing from 1985 to the mid-
1990s when competitors sprouted up eroding significant market-share By
1993 Actel was serving about 18 percent of the market
The 1990s were an explosive period of time for FPGAs both in sophistication
and the volume of production In the early 1990s FPGAs were primarily used
in telecommunications and networking By the end of the decade FPGAs
found their way into consumer automotive and industrial applications
FPGAs got a glimpse of fame in 1997 when Adrian Thompson merged genetic
algorithm technology and FPGAs to create a sound recognition device
Thomsonrsquos algorithm allowed an array of 64 x 64 cells in a Xilinx FPGA chip
to decide the configuration needed to accomplish a sound recognition task
121 Modern developments
A recent trend has been to take the coarse-grained architectural
approach a step further by combining the logic blocks and interconnects of
traditional FPGAs with embedded microprocessors and related peripherals to
form a complete system on a programmable chip This work mirrors the
architecture by Ron Perlof and Hana Potash of Burroughs Advanced Systems
Group which combined a reconfigurable CPU architecture on a single chip
called the SB24 That work was done in 1982 Examples of such hybrid
technologies can be found in the Xilinx Virtex-II PRO and Virtex-4 devices
which include one or more PowerPC processors embedded within the FPGAs
54
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 55
logic fabric The Atmel FPSLIC is another such device which uses an AVR
processor in combination with Atmels programmable logic architecture
An alternate approach to using hard-macro processors is to make use of soft
processor cores that are implemented within the FPGA logic (See Soft
processors below)
As previously mentioned many modern FPGAs have the ability to be
reprogrammed at run time and this is leading to the idea of reconfigurable
computing or reconfigurable systems mdash CPUs that reconfigure themselves to
suit the task at hand The Mitrion Virtual Processor from Mitrionics is an
example of a reconfigurable soft processor implemented on FPGAs However
it does not support dynamic reconfiguration at runtime but instead adapts itself
to a specific program
Additionally new non-FPGA architectures are beginning to emerge
Software-configurable microprocessors such as the Stretch S5000 adopt a
hybrid approach by providing an array of processor cores and FPGA-like
programmable cores on the same chip
Gates
1987 9000 gates Xilinx
1992 600000 Naval Surface Warfare Department
Early 2000s Millions
Market size
1985 First commercial FPGA technology invented by Xilinx
1987 $14 million
~1993 gt$385 million
2005 $19 billion
2010 estimates $275 billion
55
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 56
122 FPGA Comparisons
Historically FPGAs have been slower less energy efficient and generally
achieved less functionality than their fixed ASIC counterparts A combination
of volume fabrication improvements research and development and the IO
capabilities of new supercomputers have largely closed the performance gap
between ASICs and FPGAs
Advantages include a shorter time to market ability to re-program in the field
to fix bugs and lower non-recurring engineering costs Vendors can also take a
middle road by developing their hardware on ordinary FPGAs but
manufacture their final version so it can no longer be modified after the design
has been committed
Xilinx claims that several market and technology dynamics are changing the
ASICFPGA paradigm
IC costs are rising aggressively
ASIC complexity has bolstered development time and costs
RampD resources and headcount is decreasing
Revenue losses for slow time-to-market are increasing
Financial constraints in a poor economy are driving low-cost technologies
These trends make FPGAs a better alternative than ASICs for a growing
number of higher-volume applications than they have been historically used
for which the company blames for the growing number of FPGA design starts
(see History)
The primary differences between CPLDs and FPGAs are architectural A
CPLD has a somewhat restrictive structure consisting of one or more
programmable sum-of-products logic arrays feeding a relatively small number
of clocked registers The result of this is less flexibility with the advantage of
more predictable timing delays and a higher logic-to-interconnect ratio The
FPGA architectures on the other hand are dominated by interconnect This
56
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 57
makes them far more flexible (in terms of the range of designs that are
practical for implementation within them) but also far more complex to design
for
Another notable difference between CPLDs and FPGAs is the presence in most
FPGAs of higher-level embedded functions (such as adders and multipliers)
and embedded memories as well as to have logic blocks implement decoders
or mathematical functions
Some FPGAs have the capability of partial re-configuration that lets one
portion of the device be re-programmed while other portions continue running
123 Applications
Applications of FPGAs include digital signal processing software-defined
radio aerospace and defense systems ASIC prototyping medical imaging
computer vision speech recognition cryptography bioinformatics computer
hardware emulation radio astronomy and a growing range of other areas
FPGAs originally began as competitors to CPLDs and competed in a similar
space that of glue logic for PCBs As their size capabilities and speed
increased they began to take over larger and larger functions to the state where
some are now marketed as full systems on chips (SoC) Particularly with the
introduction of dedicated multipliers into FPGA architectures in the late 1990s
applications which had traditionally been the sole reserve of DSPs began to
incorporate FPGAs instead
FPGAs especially find applications in any area or algorithm that can make use
of the massive parallelism offered by their architecture One such area is code
breaking in particular brute-force attack of cryptographic algorithms
FPGAs are increasingly used in conventional high performance computing
applications where computational kernels such as FFT or Convolution are
performed on the FPGA instead of a microprocessor
57
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 58
The inherent parallelism of the logic resources on an FPGA allows for
considerable computational throughput even at a low MHz clock rates The
flexibility of the FPGA allows for even higher performance by trading off
precision and range in the number format for an increased number of parallel
arithmetic units This has driven a new type of processing called
reconfigurable computing where time intensive tasks are offloaded from
software to FPGAs
The adoption of FPGAs in high performance computing is currently limited by
the complexity of FPGA design compared to conventional software and the
extremely long turn-around times of current design tools where 4-8 hours wait
is necessary after even minor changes to the source code
Traditionally FPGAs have been reserved for specific vertical applications
where the volume of production is small For these low-volume applications
the premium that companies pay in hardware costs per unit for a
programmable chip is more affordable than the development resources spent
on creating an ASIC for a low-volume application Today new cost and
performance dynamics have broadened the range of viable applications
Architecture
The most common FPGA architecture consists of an array of configurable
logic blocks (CLBs) IO pads and routing channels Generally all the routing
channels have the same width (number of wires) Multiple IO pads may fit
into the height of one row or the width of one column in the array
An application circuit must be mapped into an FPGA with adequate resources
While the number of CLBs and IOs required is easily determined from the
design the number of routing tracks needed may vary considerably even
among designs with the same amount of logic (For example a crossbar switch
requires much more routing than a systolic array with the same gate count)
Since unused routing tracks increase the cost (and decrease the performance)
of the part without providing any benefit FPGA manufacturers try to provide
58
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 59
just enough tracks so that most designs that will fit in terms of LUTs and IOs
can be routed This is determined by estimates such as those derived from
Rents rule or by experiments with existing designs
The FPGA is an array or island-style FPGA It consists of an array of logic
blocks and routing channels Two IO pads fit into the height of one row or the
width of one column as shown below All the routing channels have the same
width (number of wires)
121 FPGA structure
A classic FPGA logic block consists of a 4-input lookup table (LUT) and a
flip-flop as shown below In recent years manufacturers have started moving
to 6-input LUTs in their high performance parts claiming increased
performance
Typical logic block
59
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 60
There is only one output which can be either the registered or the unregistered
LUT output The logic block has four inputs for the LUT and a clock input
Since clock signals (and often other high-fanout signals) are normally routed
via special-purpose dedicated routing networks in commercial FPGAs they
and other signals are separately managed
For this example architecture the locations of the FPGA logic block pins are
shown below
Logic Block Pin Locations
Each input is accessible from one side of the logic block while the output pin
can connect to routing wires in both the channel to the right and the channel
below the logic block
Each logic block output pin can connect to any of the wiring segments in the
channels adjacent to it
Similarly an IO pad can connect to any one of the wiring segments in the
channel adjacent to it For example an IO pad at the top of the chip can
connect to any of the W wires (where W is the channel width) in the horizontal
channel immediately below it
Generally the FPGA routing is unsegmented That is each wiring segment
spans only one logic block before it terminates in a switch box By turning on
60
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 61
some of the programmable switches within a switch box longer paths can be
constructed For higher speed interconnect some FPGA architectures use
longer routing lines that span multiple logic blocks
Whenever a vertical and a horizontal channel intersect there is a switch box In
this architecture when a wire enters a switch box there are three
programmable switches that allow it to connect to three other wires in adjacent
channel segments The pattern or topology of switches used in this
architecture is the planar or domain-based switch box topology In this switch
box topology a wire in track number one connects only to wires in track
number one in adjacent channel segments wires in track number 2 connect
only to other wires in track number 2 and so on The figure below illustrates
the connections in a switch box
Switch box topology
Modern FPGA families expand upon the above capabilities to include higher
level functionality fixed into the silicon Having these common functions
embedded into the silicon reduces the area required and gives those functions
increased speed compared to building them from primitives Examples of these
include multipliers generic DSP blocks embedded processors high speed IO
logic and embedded memories
61
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 62
FPGAs are also widely used for systems validation including pre-silicon
validation post-silicon validation and firmware development This allows chip
companies to validate their design before the chip is produced in the factory
reducing the time to market
124 FPGA Design and Programming
To define the behavior of the FPGA the user provides a hardware description
language (HDL) or a schematic design The HDL form might be easier to work
with when handling large structures because its possible to just specify them
numerically rather than having to draw every piece by hand On the other hand
schematic entry can allow for easier visualisation of a design
Then using an electronic design automation tool a technology-mapped netlist
is generated The netlist can then be fitted to the actual FPGA architecture
using a process called place-and-route usually performed by the FPGA
companys proprietary place-and-route software The user will validate the
map place and route results via timing analysis simulation and other
verification methodologies Once the design and validation process is
complete the binary file generated (also using the FPGA companys
proprietary software) is used to (re)configure the FPGA
Going from schematicHDL source files to actual configuration The source
files are fed to a software suite from the FPGACPLD vendor that through
different steps will produce a file This file is then transferred to the
FPGACPLD via a serial interface (JTAG) or to an external memory device
like an EEPROM
The most common HDLs are VHDL and Verilog although in an attempt to
reduce the complexity of designing in HDLs which have been compared to the
equivalent of assembly languages there are moves to raise the abstraction level
through the introduction of alternative languages
62
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 63
To simplify the design of complex systems in FPGAs there exist libraries of
predefined complex functions and circuits that have been tested and optimized
to speed up the design process These predefined circuits are commonly called
IP cores and are available from FPGA vendors and third-party IP suppliers
(rarely free and typically released under proprietary licenses) Other
predefined circuits are available from developer communities such as Open
Cores (typically free and released under the GPL BSD or similar license) and
other sources
In a typical design flow an FPGA application developer will simulate the
design at multiple stages throughout the design process Initially the RTL
description in VHDL or Verilog is simulated by creating test benches to
simulate the system and observe results Then after the synthesis engine has
mapped the design to a netlist the netlist is translated to a gate level
description where simulation is repeated to confirm the synthesis proceeded
without errors Finally the design is laid out in the FPGA at which point
propagation delays can be added and the simulation run again with these
values back-annotated onto the netlist
Basic Process Technology Types
SRAM - based on static memory technology In-system programmable and re-
programmable Requires external boot devices CMOS
Antifuse - One-time programmable CMOS
EPROM - Erasable Programmable Read-Only Memory technology Usually
one-time programmable in production because of plastic packaging
Windowed devices can be erased with ultraviolet (UV) light CMOS
EEPROM - Electrically Erasable Programmable Read-Only Memory
technology Can be erased even in plastic packages Some but not all
EEPROM devices can be in-system programmed CMOS
Flash - Flash-erase EPROM technology Can be erased even in plastic
packages Some but not all flash devices can be in-system programmed
63
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 64
Usually a flash cell is smaller than an equivalent EEPROM cell and is
therefore less expensive to manufacture CMOS
Fuse - One-time programmable Bipolar
Major Manufacturers
Xilinx and Altera are the current FPGA market leaders and long-time industry
rivals Together they control over 80 percent of the market with Xilinx alone
representing over 50 percent
Xilinx also provides free Windows and Linux design software while Altera
provides free Windows tools the Solaris and Linux tools are only available via
a rental scheme
Other competitors include Lattice Semiconductor (flash SRAM) Actel
(antifuse flash-based mixed-signal) SiliconBlue Technologies (low power)
Achronix (RAM based 15GHz fabric speed) and QuickLogic (handheld
focused CSSP no general purpose FPGAs)
125 FPGA prototype
FPGA prototyping sometimes also referred to as ASIC prototyping or SoC
prototyping is the method to prototype SoC and ASIC design on FPGA for
hardware verification and early software development
Main stream verification methods for hardware design and early software and
firmware co-design has become mainstream Prototyping SoC and ASIC
design on FPGA has become a good method to do this
64
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 65
Reasons why Prototyping is important
1 Running a SoC design on FPGA prototype is a reliable way to ensure that it is
functionally correct This is compared to designers only relying on software
simulations to verify that their hardware design is sound Simulation speed and
modeling accuracy limitations hinder this development
2 Due to time constrains many projects cannot wait until the silicon is back from
the foundry to start on software tests FPGA prototyping allows for much more
time in area of software development and testing at the software-hardware
integration stage This allows many unforeseen software bugs that appear due
to todays array of operating systems applications and hardware
3 Prototyping also allows the developer to ensure that all IP technologies on his
system work well together off the simulation stage and in actual form
4 Prototyping has the added advantage as demo platforms to SoC clients
bringing in interest early This speeds up the overall development cycle and
allows for more enhancement or improvement to the chip features as it would
otherwise have been
65
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 66
13 SIMULATION RESULTS
66
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 67
GATE LEVEL
FigVGA CONTROLLER
67
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 68
GATE LEVEL
Fig SYNC UNIT
68
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 69
TECHNOLOGY SCHEMATIC
69
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 70
14 APPLICATIONS
Motion Tracking Systems
Image processing systems
Displaying Systems
141 CONCLUTION
Hence by implementing FPGA taking a processor-generated picture
(pixilated) from memory space the Controller provides digital RGB values for
each pixel as well as horizontal and vertical synchronization signals in order
to correctly display the picture on a connected monitor
70
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons Page 71
REFERENCES
1 httpenwikipediaorgwikiVideo_Graphics_Array
2 A VGA display controller by Eduardo Sanchez
3 HinnerrdquoVGA Interface and video signal documentsrdquo
71
Pixel 6 SIGNALS IN VGA CONTROLLER Gates Market size 122 FPGA Comparisons