25 Chapter 2: Literature Survey 2.1 Review of ASIP-related Literature With the increasing interest in the ASIP design, many researchers have proposed several techniques for ASIP design. Techniques suggested by Sato et al. [14] and Gloria et al. [29] are among the earliest techniques and techniques suggested by Renhai et al [57], Fontaine et al. [58], David et al. [59] and Lee et al. [60] are the recent ones. In this Chapter a brief survey of ASIP design methodologies is presented. This helps in placing our work in the overall context. ASIPs are programmable processors with an instruction set and the underlying micro-architecture of the processor optimized for high speed real-time execution of a class of applications. The ASIPs bridge the gap between Application Specific Integrated Circuits (ASICs) and general-purpose programmable processors in terms of performance, power, cost and flexibility [24]. Due to their programmability ASIPs have a flexible functionality within their application domain and yet due to hardware optimization of their micro-architectures (that support execution of instructions) they achieve very high performance comparable to ASICs (which typically have a fixed functionality). Therefore for implementing system logic with a provision of easy upgradability and a desired kind of optimization among performance, power and cost, Application Specific Instruction set Processor (ASIP) based system logic implementation approach provides a powerful path. It is being projected that ASICs will be replaced by ASIPs in near future because it is getting harder and more expensive to design and manufacture ASICs [25]. Gloria et al. [29] suggested some of the main requirements for the design of Application Specific Architectures (ASA) as follows: Start design cycle with the description of behavior of the application (behavioral description) specified by means of a high-level languages. Identify hardware functionalities to speed up the application.
22
Embed
Application Specific VLSI Processor Design For Parametric Speech
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
25
Chapter 2: Literature Survey
2.1 Review of ASIP-related Literature
With the increasing interest in the ASIP design, many researchers have proposed
several techniques for ASIP design. Techniques suggested by Sato et al. [14] and Gloria
et al. [29] are among the earliest techniques and techniques suggested by Renhai et al
[57], Fontaine et al. [58], David et al. [59] and Lee et al. [60] are the recent ones. In this
Chapter a brief survey of ASIP design methodologies is presented. This helps in placing
our work in the overall context.
ASIPs are programmable processors with an instruction set and the underlying
micro-architecture of the processor optimized for high speed real-time execution of a
class of applications. The ASIPs bridge the gap between Application Specific Integrated
Circuits (ASICs) and general-purpose programmable processors in terms of performance,
power, cost and flexibility [24].
Due to their programmability ASIPs have a flexible functionality within their
application domain and yet due to hardware optimization of their micro-architectures
(that support execution of instructions) they achieve very high performance comparable
to ASICs (which typically have a fixed functionality).
Therefore for implementing system logic with a provision of easy upgradability
and a desired kind of optimization among performance, power and cost, Application
Specific Instruction set Processor (ASIP) based system logic implementation approach
provides a powerful path.
It is being projected that ASICs will be replaced by ASIPs in near future because
it is getting harder and more expensive to design and manufacture ASICs [25].
Gloria et al. [29] suggested some of the main requirements for the design of
Application Specific Architectures (ASA) as follows:
Start design cycle with the description of behavior of the application
(behavioral description) specified by means of a high-level languages.
Identify hardware functionalities to speed up the application.
26
Evaluate several architectural options.
Introduce hardware resources for frequently used operation.
In the ASIP design, it is important to search for a processor architecture that
matches target application. To achieve this goal, it is essential to estimate design quality
of various candidate architectures in terms of their area, performance and power
consumption.
Typically ASIP design starts with analysis of the applications. Sato et al. [14]
reported an Application Program Analyzer (APA) in 1991. The output of APA includes
data types and their access methods, the frequency of individual operations and sequence
of continuous operations. The output of APA is used to define instruction-set.
More recently application analyzers such as those developed by Gupta et al. [47]
extract a larger number of application parameters. These include the average basic block
size, number of Multiply-Accumulate (MAC) operations, ratio of address computation
instructions to data computation instructions, ratio of input/output instructions to the total
instructions etc. Idea behind extracting these parameters is to make a decision about
inclusion of a hardware unit in the processor depending on the values of the above
mentioned application parameters. For example, if MAC operation is frequently used in
the application then it is useful to have a unit to perform this functionality in hardware.
Recently, Kolar et al. [61] have presented the concept of automatically generated
just-in-time translated simulator with the profiling capabilities. They have shown that this
simulator is very fast, generated in a short time and can be even used for simulation of
special applications such as applications with self-modifying code or applications for
systems with external memories.
Architectures considered by different researchers also differ in terms of the
instruction level parallelism they support. For example, Binh et al. [45] do not support
instruction level parallelism, whereas Gupta et al. [47] support VLIW architecture.
Binh et al. [16] suggested a HW/SW partitioning algorithm (branch-and-bound)
for synthesizing the highest performance pipelined ASIPs with multiple identical
functional units. Gate count and power consumption are the given constraints. They have
improved their algorithm considering RAM and ROM sizes as well as chip area
27
constraints [16]. Chip area includes the hardware cost of the register file for a given
application program with the associated input data set. This optimization algorithm
defines the best trade-offs between the CPU core, RAM and ROM of an ASIP chip to
achieve highest performance while satisfying design constraints on the chip area.
Huang et al. [12] have shown that instruction set can also be generated by
augmenting the instruction set with special instructions that are synthesized from scratch.
They considered the process of instruction set generation only after the parallelism and
functionality of the processor micro-architecture is finalized based on the application.
Gshwind [11] considered that the processor micro-architecture is fixed and only
the instruction set is generated within the flexibility provided by the micro-architecture.
Cong et al. [34] present an automated compilation flow for detection and
generation of application-specific instructions for ASIPs, based on pattern detection,
followed by instruction set selection guided by a cost function taking into account
occurrence, speedup and area costs.
Galuzzi et al. [62] present an algorithm for automatic selection of application-
specific instructions with hardware constraints.
Imai et al. [41] assumes that the instruction set can be divided into two groups:
operations and functions. The algorithm developed by Imai et al. [41] automate the
design of ASIP instruction set, as well as enabling designers to estimate the performance
of their design before implementation.
In ASIPs, the customization of the design is focused on the addressed application
domain - they are more specialized and therefore more optimized than PDSPs
(Programmble Digital Signal Processors), in terms of timing performance, energy
consumption and required area [25].
Fanucci et al. [25] have shown that Architecture Description Languages (ADLs)
offer the ASIP designer a quick and optimal design convergence by automatically
generating the software tool-suite as well as the Register Transfer Level (RTL)
description of the processor. Of course, while designing an ASIP, it is the designer’s duty
to do the trade-off of performance versus flexibility in the most suitable way. Depending
on the application, a more specialized or a more flexible ASIP may be desirable. The
flexibility provided by programmability comes with a performance and power overhead
28
[24]. ASIPs have the potential of requiring less area or power than general-purpose
processors. Hence, they are popular especially for low power applications.
Several techniques have been proposed to enhance the energy-efficiency of ASIPs
[37]. While those techniques can reduce the energy consumption with a minimal change
in the instruction set, they fail to exploit the opportunity of designing the entire
instruction set from the energy-efficiency perspective.
Renhai et al. [57] have designed the ASIP for AES based on the ESL (Electronic
System Level) methodology. In ESL based design flow, a commercial processor tool
based on Language for Instruction-set Architectures (LISA) is adopted. They have
developed several instructions on the basis of initial profiling of C description of AES.
They have implemented only four AES specific instructions.
David et al. [59] have presented a methodology using computer-aided design, for
development of high-performance Application Specific Instruction set Processor (ASIP)
targeting applications saturated in repetitive sequential bitwise operations and data-flow
dependencies, thus exposing both fine and coarse grain parallelism through a set of
recurring pattern extraction tools.
Fontaine et al. [58] have proposed a multiprocessor ASIP architecture based on
the Tensilica Xtensa processor for accelerating an implementation of 3D tracking
algorithm. They have used extensible architecture to implement custom instructions.
They have chosen a 3-processor architecture after the analysis of the algorithm and the
profiling results. Each processor is designed for a specific task: frame loading, target
tracking and target tracking with 3D calculations.
Guzma et al. [64] have presented an implementation methodology that leads from
an application specification in high level model of computation, Synchronous Data-flow
Graph, to an implementation as an application specific instruction set processor.
Lee et al. [60] have proposed a new application specific processor based on 6-
stage pipelined dual issue VLIW+SIMD architecture and compiler for efficient H.264
inverse transform and inverse quantization. The behavior, the structure, and the I/O
interface have been described using LISA (language for instruction set architecture).
Momcilovic et al. [64] have proposed a low power Application Specific
Instruction Set Processor (ASIP) to implement data-adaptive Motion Estimation (ME)
29
algorithms, that is characterized by a specialized data-path and minimum and optimized
instruction set. To support this instruction set, a simple and efficient micro-architecture
was also designed and implemented. Control signals are generated by a quite simple and
hardwired control unit. A set of software tools was developed and made available to
program ME algorithms on the proposed ASIP, namely, an assembler and a cycle-based
accurate simulator. The proposed architecture was described in VHDL and synthesized
for the UMC 0.13μm standard cell library.
Ragel et al. [65] have presented the impact of loop unrolling on the performance
(speed) of multi-pipeline ASIPs. They have presented how loop unrolling improves the
ILP (Instruction Level Parallelism) within loops of an application and therefore achieve
better overall performance.
During the literature survey we have observed that while significant work
has been done on the individual components of ASIP design e.g. instruction set
design/synthesis (from application analysis), the ASIP design implementations have
not been too many. Also in cases where implementations have been carried out,
commercial tools like LISA or ADL and commercial platforms like EXTENSA have
been used for implementations.
These implementations therefore appear to be constrained by the limitations
imposed by the tools and or the platforms used. While this may be agreeable from
an industrial perspective, it appears restrictively constraining from a research and
exploration point of view.
We therefore decided to build our own methodology which gives us
maximum freedom to explore ASIP solutions; carrying out some steps manually and
build our own tools for the other steps where advantageous. The only commercial
tools we used include VHDL simulation, VHDL synthesis and FPGA tools. This
gave us immense freedom and an opportunity to look very closely at the process of
designing of an ASIP.
Finally, an ASIP needs an application or application class to build for. We
decided to choose parametric speech synthesis as the application to build the ASIP
for.
30
The reasons for this choice were:
a) Parametric speech synthesizers are language independent. Given appropriate
parameters, they can generate speech in any language. Parametric speech
synthesis is usually called low-level speech synthesis. The task of determining
parameters from texts in different languages is a different task. That is
language dependent and is usually referred to as high-level synthesis.
b) Parametric speech synthesis is computationally quite intensive.
c) A good quality parametric speech synthesizer, widely used by speech
synthesis researchers, is available in public domain both as Fortran code and
‘C’ code.
d) Also the conceptual model of the synthesizer is well documented.
e) It would become possible to compare and assess the benefits of ASIP
approach vis-à-vis the use of a general purpose processor to implement the
synthesizer.
With these objectives in view, we decided to choose parametric speech
synthesis as the application to design an ASIP for using our own methodology that
provided us freedom for unfettered exploration in the field of ASIP design.
Before going further, a brief comparative overview of different methods of
speech synthesis and a description of the Klatt’s parametric speech synthesizer
(chosen by us as the target application for ASIP design) are presented below.
2.2 Introduction to Speech Synthesis
Speech synthesis may be categorized as restricted (messaging) and unrestricted
(text-to-speech) synthesis. The first one is suitable for announcing and information
systems while the latter is needed, for example, in applications for the visually impaired.
A number of text-to-speech systems capable of synthesizing unrestricted text
input for different languages are in existence today; they use different methods and
techniques to achieve this goal. The Text-To-Speech (TTS) synthesis procedure consists
31
of two main phases. The first one is text analysis, where the input text is transcribed into
a phonetic or some other linguistic representation, and the second one is the generation of
speech waveforms, where the acoustic output is produced from this phonetic and
prosodic information. These two phases are usually called as high-level and low-level
synthesis. A simplified version of the procedure is shown in Figure 2.1.