Methods for Power Minimization in Modern VLSI Circuits

46 Int. J. Reasoning-based Intelligent Systems, Vol. 4, Nos. 1/2, 2012

Copyright © 2012 Inderscience Enterprises Ltd.

Methods for power minimisation in modern VLSI circuits

Bojan Jovanović* and Milun Jevtić Department of Electronics, Faculty of Electronic Engineering, University of Niš, A. Medvedeva 14, 18000 Niš, Serbia Email: [email protected] Email: [email protected] *Corresponding author

Abstract: The continued scaling of the CMOS technology has led us into the deep submicron regimes where design is not limited by the functionality on a chip but is constrained with its power consumption. In this paper, we present some widely used techniques for static and dynamic power minimisation in modern VLSI circuits. These techniques are applicable on the different stages of the system design, starting from technology level where designer is allowed to change technology parameters (transistor sizes, supply and threshold voltages) up to the top level which deals with the design’s architectural variations. Along with the overview of power minimisation techniques, as an example, the circuit of binary divider was introduced and implemented in various families FPGAs to demonstrate technological as well as Placement and Routing (PAR) influence on total power consumption.

Keywords: static and dynamic power consumption; power minimisation; binary divider; FPGA.

Reference to this paper should be made as follows: Jovanović, B. and Jevtić, M. (2012) ‘Methods for power minimisation in modern VLSI circuits’, Int. J. Reasoning-based Intelligent Systems, Vol. 4, Nos. 1/2, pp.46–57.

Biographical notes: Bojan Jovanović received his BSc in Electronics from Faculty of Electronic Engineering, University of Niš in 2006. He is currently candidate for PhD degree and Teaching and Research Assistant at the Department of Electronics, University of Niš, Niš, Serbia. His current research interests include power estimation and minimisation techniques, digital IC design, real-time and embedded systems, SoCs and programmable logic devices.

Milun Jevtić is a Full Professor of Digital Electronics and Digital Integrated Circuits at the University of Niš, Serbia. He received his BSc and PhD in Electronics from Faculty of Electronic Engineering, University of Niš. His current research focuses on real-time and embedded systems, low power consumption and energy efficiency.

This paper is a revised and expanded version of a paper entitled ‘Total power consumption in modern VLSI circuits’ presented at ‘XLVI International Scientific Conference on Information, Communication and Energy Systems and Technologies, ICEST 2011’, 29 June–1 July 2011, Niš, Serbia.

1 Introduction

With the introduction of CMOS devices it was believed that the problem of power consumption is solved. Unlike BJTs, there was very neglectable current when the device is in OFF state and the power was consumed only during the circuit operation. The power of this early circuits remained within the allowable power envelope due to various heat dissipation techniques so that the designers could focus their attention primarily on achieving the needed performance. Power, if considered, was only checked to ensure that it was not too high. In the meantime, during the past decades, the progress of silicon process technology marched on relentlessly.

Transistor sizes entered deep submicron regimes and some Short Channel Effects (SCEs) begin to emerge. According to Moore’s law, every two years the number of transistors that can be integrated on a single IC approximately doubled so it was getting harder and harder to deal with the power consumption. If it is not properly considered during the design phase, power consumption can cause excessive heat demanding increasingly expensive packaging and cooling strategies which might either add significant cost to the system or provide a limit on the amount of functionality. Starting from 180 nm technologies, static power consumption due to leaky ‘off’ transistors is now a non-negligible source of power dissipation, even in running mode. Thus, the total

Methods for power minimisation 47

power consumption (i.e. dynamic plus static power) has to be optimised instead of simply reducing dynamic power. Also, the nature of power constraints may be different (i.e. the chips in cell phones and other battery powered devices versus desktop processors), but the performance one can achieve depends on how efficiently that computation can be done per unit of power/energy. Furthermore, one must be aware that the minimal is not always optimal. Minimal possible power consumption rarely achieves needed design performance bearing in mind that the least power consuming transistor is the slowest one at the same time. The correct optimisation typically either minimises the power consumption, subject to performance constraint or maximises the amount of computation for a given amount of power. Both these design optimisations can be achieved if the trade-offs between the power and delay are known.

Design methods that explore true power minimisation need to work in a large dimension search space, where power and performance of different solutions are compared. This includes system architecture optimisation (outer loop), block-level optimisation (intermediate loop) and fixed topology optimisation (inner loop). This paper is organised as follows. At the beginning, in Section 2, the sources of power consumption are described. Section 3 contains the explanation of some short-channel effects particular to deep submicron transistors. Also, some efforts to model these effects as wall as potential post-CMOS candidates are mentioned. Finally, in Sections 4–6 are described the power minimisation techniques that are widely used by circuit designers in the three previously mentioned design loops. Section 7 is for conclusions and future work.

2 The sources of power consumption

Some integrated circuit (IC) designed in CMOS technology can consume the power/energy of the supply source in three ways. If the source power is used to establish logic levels in the nodes of ICs it is dynamic i.e. functional power consumption. In the case of static power consumption, source power is wasted on some unwanted processes which do not contribute to the circuit functionality but reduce it. Finally, short-circuit power consumption is a consequence of the finite rise/fall times of the CMOS input wave forms.

2.1 Dynamic power consumption

Dynamic power occurs due to the charging of the load capacitance when the transistors change their state from logic ‘0’ to logic ‘1’. For example, consider the inverter circuit in Figure 1.

The transistors marked with dot-line are turned off during the corresponding transition. Every time there is a transaction from logic ‘1’ to logic ‘0’ at the inputs of the inverter, the load capacitance Cl is charged from 0 to Vdd. The energy needed for capacitance charging is provided by the power supply Vdd. Additionally, part (half) of the energy provided by the power supply is dissipated on the PMOS transistor.

During the input transition from ‘0’ to ‘1’, Cl is discharged through the NMOS transistor, and the accumulated energy in Cl is dissipated by this transistor (Rabaey et al., 2003). The expression for dynamic power consumption is widely known:

2 D l ddP C V fα= ⋅ ⋅ ⋅ (1)

where α is average number of 0 → 1 transitions within one-clock cycle (since the source power is consumed only during the 0 → 1 transitions at the output), and f is the clock frequency. A higher frequency leads to more signal transitions, which in turn, increase the circuit power dissipation.

Figure 1 Charging and discharging of a load capacitance (see online version for colours)

2.2 Static power consumption In order to achieve higher density and performance and lower power consumption, CMOS devices have been continuously scaled down for a few decades. However, when the transistor technology has an order of magnitude around or below 100 nm, the advantages gained by the size shrink (increased density and performance as well as power reduction) disappear due to quantum effects which cannot be neglected. These effects produce an excessive increment in the leakage currents. The latter mostly occurs when the transistor is in off state and when it should act as an infinite resistance. Leakage currents consist of two main components: subthreshold leakage current (I2) and gate leakage current (I3) as shown in Figure 2.

Figure 2 Leakage current components

Source: Roy et al. (2003)

There are some other leakage current components that have started to gain interest recently due to an excessive scaling of the transistor dimensions. They occur due to shorter channel length: injection of hot carriers from substrate to

48 B. Jovanović and M. Jevtić

gate oxide (I4) and punchthrough leakage (I6), due to thinner oxide thickness: gate-induced drain leakage (I5), and, due to high doping concentrations: junction reverse-biased current (I1) (Roy et al., 2003).

However, the largest amount of static power is still owed to subthreshold leakage current. It is the most temperature-dependent leakage component, and thus, every increase in dynamic power produces an exponential increment of the chip temperature, which in turn, increases the leakage component. At 85°C (a common junction temperature limit for commercial hardware), the leakage currents increase by the factor of 60 over their room-temperature values. This leakage components are also one of the main reasons of the scaling process is facing difficulties. Subthreshold leakage current can be expressed as follows:

2 1GS th DS

T T

V V VV Vox

sub TWC

I V e eL

ημ− −⎛ ⎞

⎜ ⎟= −⎜ ⎟⎝ ⎠

(2)

where μ is carrier mobility, W and L channel width and length, respectively, Cox the oxide capacitance, VT the thermal voltage (26 mV at 25°C), η Drain Induced Barrier Lowering (DIBL) coefficient, VGS and VDS voltаges of gate and drain related to the source, respectively, and Vth threshold voltage. Four tunnelling mechanisms (the gate to channel, bulk, source and drain as well as analytical expressions for leakage currents can be found in Roy et al.’s (2003) and Inagaki et al.’s (2007) work.

2.3 Short-circuit power In actual designs, the assumption of the zero rise and fall times of the input wave forms is not correct. The finite slope of the input signal causes a direct current path between Vdd and GND for a short period of time during switching, while the NMOS and the PMOS transistors are conducting simultaneously. From Figure 3 we can see that short-circuit current occurs when the input voltage of CMOS inverter is within a range [Vthn, Vdd – |Vthp|], where Vthn and Vthp are threshold voltages of NMOS and PMOS transistors, respectively. The peak of this current is determined by the saturation current of the device and is hence directly proportional to the sizes of transistors. It is also a strong function of the ratio between input and output slopes, i.e. output capacitance Cl (Rabaey et al., 2003).

This relationship is best illustrated by the following simple analysis: consider a CMOS inverter with 0 → 1 at the input. Assume first that the load capacitance is very large, so that the output fall time is significantly larger than the input rise time. Under these circumstances, the input moves through the transient region before output starts to change and the short-circuit current is close to zero. Consider now the reverse case, where the output capacitance is very small, and the output fall time is substantially smaller than the input rise time. The drain-source voltage of the PMOS device equals Vdd for most of the transition period, guaranteeing the maximal short-circuit current (equal to the saturation current of the PMOS). This clearly represents the worst-case condition. The conclusion of the above analysis is shown in

Figure 3, which plots the short-circuit current through the NMOS transistor during a low-to-high transitions as a function of the load capacitance. Short-circuit current also depends on technological parameters (supply and threshold voltages) as well as the clock frequency.

Figure 3 Short-circuit current during transients (see online version for colours)

Vdd

Cl

Vin isc

Vout

isc

Vin

Ipeak

Vthn

Vdd-|Vthp|

t

t

0 2 4

0

-0.5

0.5

1

1.5

2

2.5x10-4

x10-106

time (sec)

i sc(A

)

Cl = 500 fF

Cl = 100 fF

Cl = 20 fF

Many authors, such as Veendrick (1984) and Bisdounis (2010), were trying to model the energy dissipated due to short-circuit leakage. These models are more or less accurate since they take into consideration more or less parameters and effects which occur in CMOS circuits. Nose and. Sakurai (2000a, 2000b) try to estimate short-circuit power consumption (taking SCEs into consideration) as well as to anticipate its trends in the future generations CMOS circuits. Having in mind continuous scaling trend in transistor supply and threshold voltages, they warn that it is necessary to keep the ratio Vth/Vdd constant. If not, i.e. if the ratio is decreasing (when the threshold voltages scaling at a faster rate than the supply voltage), the effect of the short-circuit power dissipation will increase and become an important part (up to 20%) of the total power dissipation of CMOS VLSIs. One of the most efficient techniques for optimising short-circuit power consumption in a global way is the technique of matching (equalising) the rise/fall times of the input and output signals. At the overall circuit level, this means that rise/fall times of all signals should be kept constant within a range. In this way, most power dissipation is associated with the static and dynamic power. Only a minor fraction (<10%) is devoted to short-circuit currents (Veendrick, 1984). Since this technique is well-studied, generally accepted and widely implemented, most authors do not take into consideration short-circuit power in their studies on power consumption.

3 Sub-100 nm transistor: models and effects

Up to the breaking of the 1μm magic barrier Shockley’s analytical model for unipolar transistors (Shockley, 1952) was


the basis for analysing and modelling of complex integrated circuits. The simplicity of this model made it appropriate for applications in many CAD programmes. However, the Shockley’s model cannot reproduce the voltage–current characteristics of the recent short-channel MOSFETs, mainly because it does not include the velocity saturation effects of carriers, which become eminent in the deep submicrometer regime.

From an operational perspective, the main characteristics of sub-100 nm MOS transistors can be summarised as follows:

• Emerging of carrier velocity saturation effect. The consequence of this is the linear dependence between voltage and current in the saturation region and consequently reduced current drive for a given gate voltage.

• Threshold voltage is a function of channel length and operational voltages which reduce the possibility of threshold control through body biasing.

• Leakage currents (both subthreshold and gate) plays a major role and decrease Ion/Ioff ratio.

3.1 Sub-100 nm transistor models To address these issues many authors were trying to introduce some transistor models of varying complexity and accuracy. Probably the most accurate model, which still allows for fast analysis and requires only a restricted set of parameters was introduced by Taur and Ning (1998). One important parameter in this model is the critical electrical field Ec, which determines the onset of velocity saturation. The problem with this model is its highly non-linear nature, which makes it hard to use in optimisation programmes and hand analysis.

The ‘unified model’ of the MOS transistor was introduced by Rabaey et al. (2003). A single non-linear equation suffices to describe the transistor in the saturation and linear regions. The main simplification in this model is the assumption that velocity saturation occurs at a fixed voltage VDSat, independent of the value of VGS. The main advantages of the model are its elegance and simplicity. A total of only five parameters (empirically derived) are needed to describe the transistor.

Even simpler and also empirical is the alpha power model, introduced by Sakurai and Newton (1990). This simple extension of Shockley’s square-law MOS model is appropriate for handling MOSFET circuits analytically and can predict the circuit behaviour in the sub-micrometer region.

Unlike previously mentioned empirical models there are many models which, by using numerous parameters, try to describe as reliably as possible the physical effects occurring in the transistor structure. In 1996 the Compact Model Council (CMC) was established with the vision to choose, maintain and promote the use of standard compact models for all major technologies so that customer communication and efficiency can be enhanced (http://www.eigroup.org/cmc/). New models are submitted to the Council, where their technical merits are discussed, and then potential standard models are voted on. Some of the models supported by CMC

include BSIM3, BSIM4, BSIMSOI and PSP as MOSFET standards and HICUM and MEXTRAM as bipolar transistors standards. More details about physical processes described by mentioned models as well as the equations used can be found in Gildenblat’s (2010) work.

3.2 Sub-100 nm transistor effects The transistor threshold voltage is unfortunately not a constant parameter. It is influenced by a number of operational parameters. The foremost is the body-bias or back-bias effect, where the fourth terminal of the transistor (the bulk or well voltage) serves as an extra control knob. The beauty of the body-biasing effect is that it allows for a dynamic adjustment of the threshold voltage during operation, thus allowing for the compensation of process variation or a dynamic trade-off between performance and leakage.

Regrettably, scaling of devices technology is gradually eroding the body biasing effect. With the doping levels in the channel increasing, changes in the bias voltage have little effect on the onset of strong inversion. This is clearly illustrated in Figure 4, which plots the impact of body biasing for three technology nodes (Rabaey, 2009). Channel length is one of the parameters that influence the threshold voltage. For very short channels, the depletion regions of the drain (and source) junctions themselves deplete a sizable fraction of the channel. Turning the transistor on becomes easier, thus causing a reduction in the threshold voltage. To offset this effect, device engineers add some extra ‘halo implants’ (Roy et al., 2003). While this is beneficial in general, it also increases the sensitivity of the threshold voltage with respect to the channel-length variations. In short-channel devices, due to proximity of the drain and the source, the depletion regions at the drain-substrate and source-substrate junctions extend into the channel. As the channel length is reduced, if the doing is kept constant, the separation between the depletion region boundaries decreases. An increase in the reverse bias across the junctions (with increase in Vds) also pushes the junction nearer to each other. When the combination of channel length and reverse bias lead to the merging of the depletion regions, punchthrough is said to have occurred (I6 in Figure 2).

Figure 4 Evolution of threshold control (see online version for colours)

Drain voltage is another variable that has a sizable impact on the threshold voltage. This phenomenon is known as


DIBL. As the drain voltage increases, the depletion region of the junction between the drain and the channel increases in size and extends under the gate, effectively lowering the threshold voltage. The most negative feature of DIBL effect is that it turns the threshold voltage into a signal-dependent variable.

An ideal MOS transistor (at least from a digital perspective) should not have any current flowing into the bulk (or well), should not conduct any current between drain and source when off, and should have an infinite gate resistance. Unfortunately, many of effects are causing the contemporary devices to digress from this ideal model. Leakage currents flowing through the reverse-biased source-bulk and drain-bulk pn junctions (I1 in Figure 2) have always been present. Yet, the levels are so small that their effects could generally be ignored. The shrinking in transistor sizes has introduces some other leakage effects that are far more influential and exceed junction leakage currents by 3–5 orders of magnitude. Already mentioned, the most dominant, subthreshold leakage current flows in the channel when gate voltage is below the transistor threshold voltage and when the transistor should be in off state. In addition, the current flowing through the drain in the off state is influenced by the gate-induced drain leakage (GIDL) effect (I5 in Figure 2). While one would expect the drain current to drop continuously when reducing VG below Vth for a given drain voltage VD, the inverse is actually true (Chen et al., 2001).

To maintain the current drive of the transistor while scaling its horizontal dimensions, general scaling theory prescribes that the gate oxide (SiO2) thickness is scaled as well. Once, however, the oxide thickness becomes of the order of just a few molecules, another leakage effect is gaining importance – gate leakage (I3 and I4 in Figure 2). The consequence of this effect is a reduction in the gate resistance of the transistor. Gate leakage current increasing threat is clearly illustrated in Figure 5 (Mistry et al., 2007).

Figure 5 Evolution of threshold control (see online version for colours)

Introduction of high-k dielectrics

Reduction of gate oxide thickness results in an increase in the field across the oxide. The high electric field coupled with low oxide thickness results in tunnelling electrons from substrate to gate and also from gate to substrate through the gate oxide, resulting in the gate oxide tunnelling current. Detailed explanation of two different tunnelling mechanisms can be found in Chandrakasan et al.’s (2001) work. A first

approach to address the challenge is to stop or slow down the scaling of the oxide thickness, while continuing the scaling of the other critical device dimensions. This negatively impacts the obtainable current density and reduces the performance benefit that typically comes with technology scaling. Yet, even considering these negative implications, this is exactly what most semiconductor companies did when moving to the 65 nm (as is apparent in Figure 5). Another solution which allows additional shrinking in gate oxide thicknesses is the introduction of gate dielectrics with a higher permittivity ε – the so called high-k dielectrics. Replacing SiO2 with a ‘high-k’ material yields the same effect as scaling the thickness, while keeping gate leakage under control (Rabaey, 2009).

It is also worth to mention that scaled device dimensions lead to increased impact of transistor parameter variations (geometric dimensions and threshold voltage). The latter can be due to device physics, manufacturing process or temporal and environmental impacts. The consequence is the impact on design performance, power (mostly leakage) and manufacturing yield.

3.3 Device and technology innovations

Finally, to address some of the concerns raised so far, the designers all over the world are trying to invent some new device structures. They mostly present any of the following features: higher mobility, better threshold control or faster subthreshold current roll-off.

The concept of strained silicon is based on idea to create a layer of silicon (typically in the transistor channel) in which the silicon atoms are stretched (or strained) beyond their normal inter-atomic distance. Moving the atoms further apart reduces the atomic forces that interfere with the movement of electrons through the transistors, resulting in higher mobility (Nainani et al., 2009).

By replacing the conventional MOS transistor channel with carbon nanotube or graphene it is also possible to significantly improve carriers mobility. Carbon nanotubes (CNTs) are hollow cylinders composed of one or more concentric layers of carbon atoms in a honeycomb lattice arrangement (Avouris et al., 2003) while graphene is one atom thick planar sheet of carbon atoms that are densely packed in a honeycomb crystal lattice (Novoselov et al., 2004).

By inserting the insulator layer (typically silicon dioxide) between transistor substrate and the channel we obtain Silicon on Insulator (SOI) MOS transistor. The presence of such a layer substantially reduces the junction capacitance (source and drain to bulk) leading to dynamic power savings or faster transistor operation. Another advantage is reduction in leakage current. The bottleneck is the elimination of body biasing as a threshold control technique (Sakurai et al., 2006).

The distinguishing characteristic of the tri-dimensional FinFET transistor is that the controlling gate is wrapped around a thin silicon ‘fin’, which forms the body of the device. The dimensions of the fin determine the effective channel length of the device. The device structure has shown the potential to scale the channel length to values that are hard, if not impossible, to accomplish in traditional planar devices. In addition, crucial advantages of the device


are higher on-current for reduced leakage and increased control, as the gate wraps almost completely around the channel. By removing the top part of the gate we can obtain back-gated transistor. In this structure, one of the gates acts as a standard control gate, whereas the other is used to control the threshold voltage (Avci and Tiwari, 2004).

The devices described here represent by no means the complete spectrum of new transistor and switching devices that are currently being explored. IPG (In-Plane-Gate), SET (Single Electron Transistor), MoS2, MEMS, MTJ (Magnetic Tunnel Junction) transistors are also worth mentioning. However, even the most promising ones are decades out in terms of true applicability. One forecast is predicting that conventional CMOS devices will be present on the market by 2030, or even longer (Nassif, 2011).

4 Fixed topology optimisation

Optimisation techniques in this level do not alter the circuit topology, so the principle variables they affect are transistor sizes, supply voltage, and threshold voltages. Some of the authors investigate the impact of single variable on circuit power consumption and delay while other perform thorough analysis considering mutual influence of two or even more variables. There are a few commonly used power minimisation techniques: gate sizing, variable supply voltage, variable threshold-voltage, multi-voltage design, power gating, clock gating, stack forcing, on-chip optical interconnect, nano-devices, etc.

4.1 Gate sizing

Algorithms for gate size optimising have insignificantly changed in the past decades. Fishburn and Dunlop (1985) proposed a fast iterative method to meet delay constraints called TILOS. In each iterative step, this method picks the transistor with maximum reduction in delay and maximum increase in its width. Variants of this method have been widely used. EinsTuner (Conn et al., 1999) is another static timing-based tool for optimally sizing the widths of CMOS transistors in digital circuits in order to minimise circuit delay or area. Authors state that keeping the area in check generally (but not always) keeps the power consumption at reasonable levels. In analysis restricted to simple logic gates and inverter chains (Rogenmoser and Kaeslin, 1997; Rabaey et al., 2003) show that parasitic capacitances and velocity saturation of submicron technologies favour wider than minimum transistor sizes. Increasing the transistor size allows additional supply voltage reduction resulting in more substantial power savings. This is fruitful until the optimal sizing factor is reached because further increases in the device sizes will only deteriorate the performance and will consequently require an increase in supply voltage.

In 90 nm technologies and beyond gate oxide leakage current has become comparable to subthreshold leakage. It is, therefore, necessary to develop methods for oxide leakage reduction, which unlike subthreshold leakage, occurs only in transistors that are ON as shown in Figure 6.

Figure 6 Gate oxide leakage in CMOS inverter

Increasing the oxide thickness will decrease gate oxide leakage current but this will be payed with substantial transistor delay. So, thick-oxide transistors in non-critical path will not speed down the circuit but will reduce static power consumption. Lee et al. (2004) propose a new method that uses dual oxide-thickness process to minimise gate oxide leakage current reducing total leakage current more than five times with just a 5% delay penalty.

4.2 Supply and threshold voltages in power minimisation

Decreasing transistor sizes enables higher densities of transistors on a chip. In order to control the power of the circuit, the power supply voltage is also reduced with each transistor scaling. Owing to quadratic relationship between dynamic power consumption and power supply, this supply voltage reduction is the most effective way to lower the dynamic power. For CMOS circuits, a lower supply voltage means lower performance. This problem is solved by reducing the threshold voltage (Vth) of a transistor. Vth is defined as a gate-source voltage of MOSFET transistor, above which, the transistor is turned on. Ideally, if the gate voltage is below the threshold voltage, the transistor is not conducting any current. However, in practice there is still some current flowing from the drain to the source of a transistor. This is the subthreshold current. Its most important feature is that it increases exponentially with any Vth decrease as shown in equation (2). That’s why this leakage current is one of the main limiting factors to scaling process. SIA Roadmap (http://www.itrs.net/) forecast supply voltage as low as 0.8–0.5 V in year 2018. Predicted threshold voltages are up to 0.1 V.

Figure 7 depicts equi-speed and equi-power lines on Vdd–Vth plane calculated from alpha-power MOSFET model (Sakurai and Newton, 1990).

If we, for example, have technology imposed constraints of Vdd = 3.3 V ± 10% and Vth = 0.55 V ± 0.1 V, bigger rectangle shown in Figure 6 defines design window. All the circuit specifications should be satisfied within the rectangle for yield-conscious designs. In the design window, circuit speed becomes slowest at the corner A while at the corner B power dissipation becomes the highest. Therefore, better trade-offs between speed and power can be found by reducing fluctuations of Vdd and Vth especially in low Vdd. The equi-speed and equi-power lines are normalised at the corners A and B by normalised factors ks and kp, respectively. Now, by sliding and sizing the design window


on the Vdd–Vth plane, it can be figured out how much speed and power dissipation are improved or degraded compared to the typical condition. For example, at Vdd = 2.1 V ± 5% and Vth = 0.18 V ± 0.05 V power dissipation can be reduced to about 40% while maintaining the circuit speed.

Figure 7 Equi-power and equi-speed lines in Vdd–Vth design space (see online version for colours)

0.5 1 1.5 2 2.5 3 3.5 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

V dd (V)

V th

(V)

A

B

P=120mW

1.21.1ks=10.90.80.70.60.5

P=300mW

1.2

1.10.01 0.05 0.1 0.2 0.3 0.4 0.5 0.9 kp=1

f=40MHz

f=40MHz

0.7

equi-speed linesequi-power lines

Piguet et al. (2006) conclude that between all the combinations of Vdd/Vth guarantee it the desired speed only one couple will result in the lowest power consumption. The location of this optimal working point and its associated total power consumption are tightly related to architectural and technology parameters. Shuster et al. (2005) give the equation of the total power consumption for circuits working at their optimal supply and threshold voltages [equation (3)]. They use alpha-power law and consider the total power consumption as a sum of dynamic and subthreashold leakage power.

( )( )

2

02 ln 1 1

21opt

tot TT

ICNfP nV A BnV CfA

α χ χαχ

⎡ ⎤⎛ ⎞⎛ ⎞≅ − + +⎢ ⎥⎜ ⎟⎜ ⎟⎜ ⎟− ⎢ ⎥⎝ ⎠⎝ ⎠⎣ ⎦

(3)

with N is number of cells in the circuit; a is average cell activity (i.e. the number of switching cells in a clock cycle over total number of cells); C is equivalent cell capacitance; f is operating frequency; I0 is average off-current per cell for VGS = Vth; n is slope in weak inversion. A and B are two fitting variables that depend of α from alpha power law. Variable χ is equal to:

0T

f LD

eInV

αα

ξχ

α

⋅ ⋅=

⎛ ⎞⎜ ⎟⎝ ⎠

(4)

with ζ (measured in Farad) a fitting parameter, which also includes the switching gate capacitance and LD the delay on critical path or logical depth.

The equation (4) is very important because it permits to analytically estimate the optimal total power directly from architectural parameters like activity (a), number of cells

(N), frequency (f), logical depth (LD, included in χ) and technology parameters like average off-current (I0), weak inversion slope (n), alpha power law coefficient (α, included in A and B) and delay coefficient (ζ, included in χ). Thus, starting from equation (4), it is possible to understand the impact of common architectural transformations, and to compare the performance of different technologies for a given architecture.

Nose and Sakurai (2000a. 2000b) present closed-form formulas for optimum supply and threshold voltages that minimise power dissipation when technology parameters and required speed are given. These formulas take the temperature into account.

Kuroda et al. (1998) minimise supply voltage by applying variable supply-voltage (VS) technique on a 32-b RISC core processor developed in a 0.4 µm CMOS technology. From an external supply, the VS scheme automatically generates minimum internal supply voltages by feedback control of a buck converter, a speed detector and a timing controller. Minimum internal supply voltage is determined so that critical-path delay is not changed. Performance in MIPS/W is improved by a factor of more than two while area penalty because of VS scheme is smaller than 1%. The same authors in 1996 introduce circuit technique for dynamically varying threshold voltage (VT) in order to reduce power dissipation of processor for portable multimedia equipment with HDTV-resolution (Kuroda et al., 1996). VT scheme consists of Leakage Current Monitors (LCMs), Self-Substrate Bias (SSB) circuits and a Substrate Charge Injector (SCI) circuit. In the active mode, the SBB controls VBB to compensate Vth fluctuation. In standby mode, the SBB applies deeper VBB to increase and cut-off leakage.

Instead of using variable supply or threshold voltages many authors propose power minimisation techniques with dual or multiple Vdd and/or Vth (Hung et al., 2004; Gao and Hayes, 2005). The gates on critical paths operate at the higher Vdd or lower Vth, while those on non-critical paths operate at the lower Vdd or higher Vth, thereby reducing overall power consumption without performance degradation. Hamada et al. (2001) derived a set of practical expressions for optimal number and values of discrete supplies and thresholds. They concluded that no more than three discrete values are needed for each tuning variable.

Among the leakage power reduction techniques power gating is commonly used to disconnect idle logic blocks from power network to curtail subthreshold leakage (Chen et al., 2010). This can be done by the help of special header and footer transistor switches. There are fine and coarse grained power gating. Some difficulties in power gating are preserving the logic levels and retention of states in sequential circuits. The similar clock gating technique is used to prevent clock signal to give a pace to non-active gates. In this way dynamic power consumption is minimised. Typically, clock gating can result in 30% of the power saving compared to the design without clock gating (Hosny and Wu, 2008). Power and clock gating techniques are both illustrated in Figure 8.


Figure 8 Power and clock gating minimisation techniques (see online version for colours)

HEADERSWITCH

VIRTUALPOWER

FOOTERSWITCH

VIRTUALGROUND

LOGICBLOCK

LOGICBLOCK

Stack forcing is another technique to tackle the ever-increasing leakage power. It has been shown (see Figure 9) that the stacking of two (or more) off transistors can significantly reduce leakage power than a single off transistor (Naranda et al., 2001). Stack arrangement of P-Channel MOS is preferred over N-Channel one because value of leakage current in PMOS is lesser as compared to NMOS. It results as the mobility of holes in PMOS is lesser than mobility of electrons in NMOS.

Figure 9 Transistor stacking for leakage current minimisation

More recently, researchers have looked at doing multiple optimisations at once. So Brodersen et al. (2002) search for tuning variable (among Vdd, Vth and gate sizing) with the largest capability for energy reduction and conclude that, to achieve the most energy-efficient design, the energy reduction potentials of all tuning variables must be balanced.

4.3 On-chip optical interconnections

Finally, some new promising techniques for extremely low power logic are emerging. Global interconnect performance required for future generations of ICs cannot be achieved with metal. Even the low resistance of the metals used for interconnections as well as the use of high-k dielectrics are not sufficient to get down interconnection time constant to the desired level. It is therefore necessary to innovate the materials used for interconnections or to use some unconventional interconnection technique. The one is introduction of on-chip optical interconnection layer. Using optical instead of electrical interconnections lead to decrease in power consumption, enormous bandwidth increase, immunity to electromagnetic noise and reduced sensitivity to temperature variations. However, there are some difficulties in obtaining a large enough optical-electrical conversion efficiency. Piguet et al. (2004) apply optical interconnection technology in clock distribution networks by replacing the electrical clock distribution tree with optical one. It can be seen that power dissipated by the electrical system is highly dependent on the operating frequency, while in the optical

system, it remains almost the same. Also, power consumption is five times lower in optical than in an electrical network (at a frequency of 5GHz).

Another power minimisation technique is on-chip wavelength division multiplexing. For example, a single waveguide could be used to replace a 64-bit bus, where each individual signal makes use of a distinct wavelength.

A possible solution for power reduction is to go towards nano-scale devices where a lower amount of charge is needed to code a bit. In these purposes single electron transistor is developed and used along with MOSFETs in building low-power gates (Piguet et al., 2004).

4.3.1 Technology influence on total power consumption

In order to practically demonstrate the influence on design’s technology parameters on the total power consumption the 12-bit binary divider logic circuit is described in VHDL and implemented in Xilinx FPGA devices from different families (Virtex-4, Virtex-5, Virtex-6 and Virtex-6 lower power). For binary division Radix-2 non-restoring algorithm with non-fractional remainder is used (Jovanovic and Jevtic, 2010). The division algorithm is presented in Figure 10.

Figure 10 Radix-2 non-restoring division algorithm

D := |Y |; RM := X; for j := M − 1 downto 0 do if Rj+1 = 0 then do Q := [qM−1qM−2 · · · qj+10 · · · 0]; Rem := 0; go to label; endif; if Rj+1 < 0 then qj := −1 else qj := 1 endif; Rj := Rj+1 − qj · 2 j · D; endfor; Q := [qM−1qM−2 · · · q0]; if X >0 and R0 < 0 then Rem := R0 + D; Q := Q − 1; elseif X <0 and R0 > 0 then Rem:= R0 −D; Q := Q+1; else Rem := R0; endif; label: if Y <0 then Qt := −Q else Qt := Q endif;

XPower CAD tool (within ISE 12.4 software) was utilised for power consumption measurements. Implementation results are presented in Figure 11.

Technology process variations (from 90 nm to 40 nm), supply voltage variations (from 1.2 V to 0.9 V), threshold voltage and transistor sizes variations obviously influence both static and dynamic power consumption. There is a clear increase in static power consumption when moving towards newer generation FPGA families. Static consumption increases due to increase in leakage currents as a consequence of shrink in transistor sizes. The shorter channel lengths and


thinner gate oxides generally used at the new process node make it easier for current to leak, either across the channel region or through the gate oxide of the transistor. Concerning dynamic power, the core FPGA supply voltage and node capacitance generally reduce with each new process node, providing substantial dynamic power savings over previous generation FPGAs. The people from Xilinx claim that a silicon manufacturing process tailored for FPGAs and an innovative unified architecture allows their new 28 nm Xilinx-7 series FPGAs 50% overall power reduction compared to previous 40 nm generation (http://www.xilinx.com/ publications/archives/xcell/Xcell76.pdf).

Figure 11 Power consumption of binary dividers implemented in FPGAs

0

2

4

6

8

10

12

0

0,2

0,4

0,6

0,8

1

1,2

1,4

Virtex-4 Virtex-5 Virtex-6 Virtex-6 LP

Dyn

amic

Pow

er [W

]

Stat

ic P

ower

[W]

Static PowerDynamic Power

90nm 65nm 40nm 40nm

In recent years Power-Delay Product (PDP) becomes a measure quantifying how effective or efficient a digital design technology is in terms of delay and power. Lower PDP means that power is better ‘translated’ into speed of operation. Figure 12 shows that Virtex-6 family FPGA (not Virtex-6 lower power-LP as one could expect) has the best PDP. The explanation lies in the fact that, compared to Virtex-6 FPGA family, the only difference in Virtex-6 LP is the scaled supply voltage (from 1 V to 0.9 V). Scaling the supply voltage obviously improves dynamic power consumption (for 12.5%, see Figure 11). However, the improvement in power consumption is ‘overpaid’ with the 26.6% decrease in the circuit speed as shown Table 1.

Figure 12 Power delay product and total power consumption of a suite of FPGA families

0

2

4

6

8

10

12

0.0E+00

5.0E-08

1.0E-07

1.5E-07

2.0E-07

2.5E-07

3.0E-07

3.5E-07


Tota

l Pow

er [W

]

Pow

er*D

elay

[J]

Power*DelayTotal Power

90nm 65nm 40nm 40nm

Table 1 Propagation delay of binary dividers


Delay [ns] 29.83 23.98 19.04 24.11

5 Block level optimisation

Mandatory part of each integrated circuit (ASIC, FPGA, custom IC, etc.) design flow is the process of physical implementation. This process is usually iterative and divided into several stages. In each iteration, design must be firstly partitioned into groups or blocks small enough to fit into a single unit (ASIC standard cells, FPGA configurable logic blocks, etc). Second, these units are assigned specific location on the chip. This stage is usually called placement. Finally, placed blocks are interconnected by wires. Assigning paths, or routes to the wires is usually done in two stages. After rough or global wiring, in detailed wiring (also called exact embedding) each wire is given a unique complete path. In the case of ASIC or custom IC design, from the detailed wiring results, masks can be generated and chips fabricated. At each stage of physical implementation process one tends to optimise the eventual performance of the system (to minimise chip area, power consumption and delay) without compromising the feasibility of the subsequent design stages. The major focus in Placement and Routing (PAR) is on minimising the length of interconnections since this translates into the time required for propagation of signals and thus into the speed of the entire design. Also, minimal interconnection length leads to less capacitive interconnects and consequently to decrease in dynamic power consumption. However, the presence of regions in which the wiring is too congested for the packaging technology should be anticipated and minimised during the PAR process. Besides, the sources of noise such as crosstalk between adjacent wires should be eliminated.

From the above mentioned one can conclude that PAR are very important phases in system design influencing overall system performance. Furthermore, placement and routing phases have both equal influences on the system features. Consequently, for the best results, both place and route phases need to be considered since the benefits from high-quality placement might impose low-quality routing decreasing overall system performances (Mulpuri and Hauck, 2001).

In theory, PAR problem is an NP-complete problem and require iterative algorithms capable of efficiently searching for a near optimal solution in a large solution space. Some of widely used ones are generic arithmetic, tabu search and simulated annealing (Minhas, 2001). Many authors investigate power-aware placement and routing. Vorwerk et al. (2010) introduce two techniques for minimising power during the FPGA placement. The first aspect discussed in their work is a power-aware objective function for placement. In particular, a capacitance model for global nets allowing net power reduction is described. The second


technique permits area and power reduction by optimising the number of combinational and sequential cells. The results are quantified across a suite of 119 industrial benchmarks targeting Actel Igloo FPGA architecture. Power is reduced by 13% on average with a 6.7% average improvement in timing performance across the suite. Cheon et al. (2005) present power-aware placement method that simultaneously performs activity-based register clustering for clock power reduction and activity-based net weighting for net switching power reduction. In the Gupta et al. (2007b) work Xilinx’s engineers consider dynamic power dissipation and present CAD techniques for dynamic power reduction in Xilinx Virtex-5 FPGA. The proposed techniques, comprising power-aware placement, routing and a novel post-routing transformation are applied to optimise the industrial designs power consumption. Board level measurements show that the techniques reduce power by 10%, on average. There are also researches on leakage-aware PAR. Gupta et al. (2007a) introduce LEAF – a novel tool for leakage-aware PAR for SoCs. Up to 190% difference in the leakage power between leakage-unaware and leakage aware PAR is observed. The similar tool called TPAP intended for FPGAs is presented by Salami (2010). It is also known that temperature variations across a chip (thermal non-uniformities) are an issue that threatens chip performance and reliability. The correlation between the total power consumption and the temperature variations across a chip is investigated by Haghdad et al. (2011). As a result, PAR guidelines are proposed that uses the correlation to efficiently optimise the chip’s total power. It is demonstrated that optimising a floor plan to minimise either the leakage or the peak temperature can lead to a significant increase in the total power consumption. The experimental results show that lowering the temperature variations across a chip not only addresses performance degradation and reliability concerns, but also significantly contributes to chip power reduction.

5.1 Demonstration of PAR influence on system performance

To demonstrate PAR influence on overall system performance already mentioned 12-bit binary divider circuitry is implemented in Xilinx Virtex-5 FPGA device. In each implementation of the same logic design the only changing parameter was PAR design goal. Three different design goals were area reduction, power optimisation and timing performance. Xilinx PAR tool runs in ten iterations and at different effort levels (1–5). This effort levels indicate the amount of time the tool spends searching for a better quality solution. In the fourth case, Xilinx IP core divider (using the same division algorithm, see http://www.xilinx.com/) was implemented. Figure 13 shows implementation results.

Implementation of Xilinx IP core divider ensures the best design performances. It is expected, having in mind utilisation of hard IP blocks (circuitry dedicated to commonly used functions) which contain only the transistors necessary to implement the required function. Furthermore, there are no

programmable interconnects, so routing capacitance is as small as possible. On the other hand, three remaining designs are implemented in the general-purpose FPGA logic. Changing of the PAR objective function can lead to obvious design performance variation. So, speed and dynamic power fluctuations are about 55% while design size varies about 65% (compared to the worst case). Static power consumption insignificantly changes.

Figure 13 Delay, power consumption and size of the same design with the different PAR goals

0

500

1000

1500

2000

2500

3000

3500

Area Reduction Power Optimization

Timing Performance

IP core

Pow

er [m

W]

Static Power

Dynamic Power

0

10

20

30

40

50

60

Area Reduction Power Optimization

Timing Performance

IP core

Del

ay [n

s]

0

50

100

150

200

250

Area Reduction

Power Optimization

Timing Performance

IP core

Slic

es [#

]

6 System architecture optimisation

The same logical function (arithmetic, filter banks, microprocessors, data paths, etc.) can be implemented using different architectures varying in their degree of parallelisation and pipelining, length of critical path, number of transistors, etc. Provided that different architectures are not equally power consumable, there is presumably one optimal architecture with the smallest total power consumption. Having in mind continuous increase in static power consumption new power reduction methodologies should not be neglect it. In running mode and with highly leaky devices, it seems obvious that very inactive gates switching very rarely are to be avoided. These devices are idle most of the time and thus do not contribute actively to the logical function, but nevertheless largely increase the static power. For a given logic function, an architecture with a reduced number of very active gates might be preferred to an architecture with a high number of less active gates. Indeed, a reduced number of gates with the same number of transitions results necessarily in an increased activity (defined as the ratio of switching devices over the total number of devices). This is in disagreement with design methodologies aiming at reducing the activity in order to reduce the dynamic part of power. Shuster et al. (2005) introduce a new methodology allowing to compare several architectures performing the same function and to select the one presenting the smallest total power consumption under fixed supply voltage (Vdd), threshold voltage (Vth) and frequency (f) constraints. They also apply the methodology on a set of 11 different 16-bit multiplier architectures with different levels of pipelining and parallelisation, concluding the following: when performing a parallelisation, the number of cells is more than


doubled (due to multiplexer overhead) and the activity is reduced by slightly less than two. For this reasons parallelised versions will always present more or approximately the same power consumption compared to the original design at the same working conditions. However, when the original architecture does not meet the speed requirements, the parallelisation can relax the timing constraints to achieve the required performances. Pipelining technique leads to slight increase in the total number of cells while activity is largely reduced mainly by glitching suppression. Depending on working conditions and technology parameters pipelined architecture may consume less power compared to the reference one. Pipelining also reduce the logical depth of the design and hence improve its speed.

Apart pipelining and parallelisation, some other architecture variations such as Residue Number System (RNS) and Logarithmic Number System (LNS) might minimise the total power consumption (Kouretes and Paliouras, 2011).

7 Conclusion and future work

Sub-100 nm transistor effects that are beginning to emerge due to shrinking in transistor sizes are described in this paper. Also, some power-aware system design methods are presented. These methods take into account static power consumption as more and more dramatic issue in very deep submicron technologies. The influence of technology parameters on design power consumption is demonstrated through implementation of Radix-2 non-restoring binary divider circuit in FPGAs with the different technological properties. With the improvements of the technology parameters, there is an obvious trend in decreasing dynamic and increasing static power consumption of the design. The phase of design physical implementation influences its overall performances. With the different placement and routing design goals there is a fluctuation in design’s speed, size and power consumption. Design architecture variations such as parallelism, pipelining, RNS, LNS, etc., could also minimise its power consumption. Parallelism has shown to be able to relax the design’s time constraints but not to decrease its power consumption. Pipelining technique reduces the length of the design critical path. Depending on the technological parameters of the design, pipelining technique can also reduce its power consumption.

Our future work might be on exploring the techniques for static and dynamic power estimation. With power minimisation and estimation techniques we could fast and easy, in several iterations, find the optimal design architecture. The one with the minimal power consumption and maximal performances.

Acknowledgements

This paper is supported by Project Grant III44004 (2011–2014) financed by Ministry of Education and Science, Republic of Serbia.

References Avci, U. and Tiwari, S. (2004) ‘Back-gated MOSFETs with

controlled silicon thickness for adaptive threshold-voltage control’, Electronics Letters, Vol. 40, No. 1, pp.74–75.

Avouris, P., Appenzeller, J., Martel, R. and Wind, S. (2003) ‘Carbon nanotube electronics’, Proceedings of IEEE, Vol. 91, No. 1, pp.1772–1784.

Bisdounis, L. (2010, December) ‘Short-circuit energy dissipation model for sub-100 nm CMOS buffers’, IEEE International Conference on Electronics Circuits and Systems, pp.615–618.

Brodersen, R., Horowitz, M., Markovic, D., Nikolic, B. and Stojanovic, V. (2002) ‘Methods for true power minimization’, ICCAD, pp.35–42.

Chandrakasan, A., Bowhill, W. and Fox, F. (2001) Design of High Performance Microprocessor Circuits, IEEE Press, Piscataway, NJ.

Chen, J., Wong, S. and Wang, Y. (2001) ‘An analytical three-terminal band-to-band tunneling model on GIDL in MOSFET’, IEEE Transaction on Electron Devices, Vol. 48, No. 7, pp.1400–1405.

Chen, S., Lin, R., Tung, H. and Lin, K. (2010) ‘Power gating design for standard-cell-like structured ASICs’, DATE, pp.514–519.

Cheon, Y., Ho, P., Kahng, A., Reda, S. and Wang, Q. (2005) ‘Power aware placement’, ASP-DAC, pp.795–800.

Conn, A., Elfadel, M. and Molzen, W. (1999) ‘Gradient based optimization of custom circuits using a static-timing formulation’, Proceedings of DAC, pp.452–459.

Fishburn, J. and Dunlop, A. (1985) ‘TILOS: a posynomial programming approach to transistor sizing’, ICCAD, pp.326–328.

Gao, F. and Hayes, J. (2005) ‘Total power reduction in CMOS circuits via gate sizing and multiple threshold voltages’, ASP-DAC, pp.31–36.

Gildenblat, G. (2010) Compact Modeling – Principles, Techniques, Applications, Springer, Berlin.

Gupta, A., Dutt, D., Kurdahi, F., Khouri, K. and Abadir, M. (2007a) ‘LEAF: a system level leakage-aware floorplanner for SoCs’, ASP-DAC, pp.274–279.

Gupta, S., Anderson, J., Farragher, L. and Wang, Q. (2007b) ‘CAD techniques for power optimization in Virtex-5 FPGAs’, CICC, pp.85–88.

Haghdad, K., Anis, M. and Ismail, Y. (2011) ‘Floorplanning for low power IC design considering temperature variations’, Microelectronics Journal, Vol. 42, No. 1, pp.85–89.

Hamada, M., Otaguro, Y. and Kuroda, T. (2001) ‘Utilizing surplus timing for power reduction’, CICC, pp.89–92.

Hosny, M. and Wu, Y. (2008) ‘Low power clocking strategies in deep submicron technologies’, ICICDT, pp.143–146.

Hung, W., Xie, Y., Vijaykrishnan, N., Kandemir, M., Irwin, M.J. and Tsai, Y. (2004) ‘Total power optimization through simultaneously multiple-Vdd multiple-Vth assignment and device sizing with stack forcing’, ISLPED, pp.144–149.

Inagaki, R., Sadachika, N., Navarro, D., Mattasch, M. and Inoue, Y. (2007) ‘A gate-current model for advanced MOSFET technologies implemented in HiSIM2’, ICCCAS, Kokura, Japan, pp.157–160.

Jovanovic, B. and Jevtic, M. (2010) ‘FPGA implementation of throughput increasing techniques of the binary dividers’, UNITECH, pp.397–401.

Kouretes, I.. and Paliouras, V. (2011) ‘RNS multi-voltage low-power multiply-add unit’, ICECS, pp.9–12.


Kuroda, T., Fujita, T., Mita, S., Nagamatu, T., Yoshioka, S., Sano, F., Norishima, M., Murota, M., Kako, M., Kinugawa, M., Kakumu, M. and Sakurai, T. (1996) ‘A 0.9V 150MHz 10mW 4 mm2 2-D discrete cosine transform core processor with variable threshold voltage scheme’, IEEE Journal of Solid-State Circuits, Vol. 31, No. 3, pp.1770–1779.

Kuroda, T., Suzuki, K., Mita, S., Fujita, T., Yamane, F., Sano, F., Chiba, A., Watanabe, Y., Matsuda, K., Maeda, T., Sakurai, T. and Furuyama, T. (1998) ‘Variable supply-voltage scheme for low power high-speed CMOS digital design’, IEEE Journal of Solid-State Circuits, Vol. 33, No. 3, pp.454–462.

Lee, D., Deogun, H., Blaauw, D. and Sylvester, D. (2004) ‘Simultaneous state Vt and Tox assignments for total standby power minimization’, DATE, 16–20 February, pp.494–499.

Minhas, M. (2001) Iterative Algorithms for Timing and Low-Power Driven VLSI Standard Cell Placement, Master’s Thesis, King Fahd University of Petroleum and Minerals, Saudi Arabia.

Mistry, K., Allen, C. and Auth, C. (2007) ‘A 45 nm logic technology with high-k+ metal gate transistors, strained silicon, 9 Cu interconnect layers, 193 nm dry petterning, and 100% Pb-free packaging’, Proceedings of IEDM, 10–12 December, pp.247–250.

Mulpuri, C. and Hauck, S. (2001) ‘Runtime and quality tradeoffs in FPGA placement and routing’, ACM/SIGDA, pp.29–36.

Nainani, N., Raghunathan, S. and Witte, S. (2009) ‘Engineering of strained III-V heterostructures for high hole mobility’, IEEE Conference on IEDM, pp.1–4.

Naranda, S., Borkar, S., De, V., Antoniadis, D. and Chandrakasan, A. (2001) ‘Scaling of stack effect and its application for leakage reduction’, ISLPED, pp.195–200.

Nassif, S. (2011) ‘Waiting for the post-CMOS Godot’, Keynote on International Great Lakes Symposium on VLSI, 3 May.

Nose, K. and Sakurai, T. (2000a) ‘Analysis and future trends on short-circuit power’, IEEE Transaction on CAD of ICs and Systems, Vol. 19, No. 9, pp.1023–1030.

Nose, K. and Sakurai, T. (2000b) ‘Optimization of Vdd and Vth for low-power and high-speed applications’, ASP-DAC, pp.469–474.

Novoselov, K., Greim, A. and Morozov, S. (2004) ‘Electric field effect in automatically thin carbon films’, Science, Vol. 306, No. 5696, pp.666–669.

Piguet, C., O’Connor, I., Gautier, J., Heer, C. and Schlichtmann, U. (2004) ‘Extremely low-power logic’, DATE, pp.656–661.

Piguet, C., Schuster, C. and Nager, J. (2006) ‘Static and dynamic power reduction by architecture selection’, PATMOS, pp.659–668.

Rabaey, J. (2009) Low Power Design Essentials, Springer, Heidelberg.

Rabaey, J., Chandrakasan, A. and Nikolic, B. (2003) Digital Integrated Circuits – A Design Perspective, Prentice Hall, New Jersey.

Rogenmoser, R. and Kaeslin, H. (1997) ‘The impact of transistor sizing on power efficiency in submicron CMOS circuits’, IEEE Journal of Solid-State Circuits, Vol. 32, No. 7, pp.1142–1145.

Roy, K., Mukhopadhyay, S. and Mahmoodi, H. (2003) ‘Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits’, Proceedings of the IEEE, Vol. 91, No. 4, pp.305–327.

Sakurai, T. and Newton, R. (1990) ‘Alpha-power law MOSFET model and applications to CMOS inverter delay and other formulas’, IEEE Journal of Solid-State Circuits, Vol. 25, No. 2, pp.584–594.

Sakurai, T., Matsuzawa, A. and Douseki, T. (2006) Fully-Depleted SOI CMOS Circuits and Technology for Ultralow-Power Applications, Springer, New York.

Salami, B. (2010) ‘Chip-dependent leakage power-aware placement algorithm for FPGAs’, ELECO, pp.350–360.

Shockley, W. (1952) ‘A unipolar field effect transistor’, Proceedings of IRE, Vol. 40, pp.1365–1376.

Shuster, C., Nagel, J., Piguet, C. and Farine, P. (2005) ‘An architecture design methodology for minimal total power consumption at fixed Vdd and Vth’, Journal of Low Power Electronics, Vol. 1, No. 1, pp.3–10.

Taur, Y. and Ning, T. (1998) Fundamentals of Modern VLSI Devices, Cambridge University Press, London.

Veendrick, H. (1984) ‘Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits’, IEEE Journal of Solid-State Circuits, Vol. 19, No. 4, pp.468–473.

Vorwerk, K., Kennings, A., Pevzner, V., Kundu, A., Raman, M., Dunoyer, J. and Hsu, Y.-C. (2010) ‘Power minimization during field programmable gate array placement’, IET Journal on Computers & Digital Techniques, Vol. 4, No. 3, pp.170–183.

Websites

Compact Model Council, http://www.eigroup.org/cmc/ International Technology Roadmap for Semiconductors, http://www.

itrs.net/ The Open64 compiler, http://open64.sourceforge.net/ Xcell Journal Third Quarter 2011, Issue 76, pp. 8-15, http://www.

xilinx.com/publications/archives/xcell/Xcell76.pdf Xilinx IP Core Divider Datasheet, http://www.xilinx.com/

Methods for Power Minimization in Modern VLSI Circuits

Documents