A Survey of Clock Distribution Techniques Including Optical and RF Networks by Sachin Chandran A report submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Master of Electrical Engineering Auburn, Alabama December 14, 2013 Keywords: Clock distribution networks, clock trees, clock skew, clock jitter, H-trees, optical clock, RF clock, wireless clock Copyright 2013 by Sachin Chandran Approved by Vishwani Agrawal, Chair, James J. Danaher Professor of Electrical and Computer Engineering Victor Nelson, Professor of Electrical and Computer Engineering Adit Singh, James B. Davis Professor of Electrical and Computer Engineering
59
Embed
A Survey of Clock Distribution Techniques Including Optical and RF Networks by Sachin Chandran
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Survey of Clock Distribution Techniques Including Optical and RF Networks
by
Sachin Chandran
A report submitted to the Graduate Faculty ofAuburn University
in partial fulfillment of therequirements for the Degree of
obviates the need for a balanced-to-unbalanced conversion between the antenna and the LNA,
and provides dual-phase clock signals to the frequency divider.
3.3 Comparison of Power Consumption with Conventional Methods
Kenneth et al. [17] have discussed the power consumption in a wireless clock distribution
network and given a comparison with conventional clock distribution network’s. To compare
the power requirements between different types of global clock distribution systems, the
system voltage and frequencies are assumed to be equal. Also, an equal capacitive load
representing the local clock generators or distribution system is assumed for each type of
global distribution system. Under these assumptions, the power dissipation can be converted
23
to capacitances and these can be used to compare the power dissipation of different global
distribution systems, similar to an approach taken in [42].
The total global capacitance can be allocated among three components: CG, CW, and
CL. CG is the equivalent capacitance of the highest level network which delivers the clock
from its source to spine locations distributed throughout the chip. CG includes the total
capacitance of the final driver stage, herein termed the sector buffer, plus any buffers leading
up to the sector buffer. The sector buffers are assumed to be exponentially tapered for
minimum delay [29]. CW is the capacitance of the interconnecting wires for delivering the
clock from the spine locations to the local distribution system. CL is the load capacitance
representing the input capacitance of the local clock generators.
Two cases are used in comparing these clock distribution systems for 0.1-um generation
microprocessors. Case 1 is for aluminum interconnects and conventional dielectrics; case 2
is for copper interconnects and low-K dielectrics.
3.3.1 Grid Based System
The grid-based system, based on DEC 21264 [19], consists of a global tree supplying the
clock to different spine/buffer locations, and the capacitance of this network is CG. These
buffers drive the clock grid, which has capacitance CW. The local clock generators tap off
of the grid. Due to the amount of wiring used to form the grid, CW is large. Consequently,
large sector buffers are needed, increasing CG.
3.3.2 H-Tree Based Systems
The H-tree system, based on IBM S/390 [52], consists of a global tree supplying the
clock to different spine/buffer locations. Each buffer drives a balanced H-tree, which drives
the local clock generators. CW is smaller for the H-tree, therefore a smaller sector buffer is
required.
24
Table 3.1: Global capacitive loading.
3.3.3 Wireless System
The wireless system consists of a clock transmitter broadcasting a microwave signal to
a grid of distributed receivers. A receiver corresponds to a spine location. Due to their low
capacitance, balanced H-trees are used to distribute the signal from the receivers to the local
clock generators. Thus, CW for both the H-tree and wireless schemes are equal. In wireless
clock distribution systems, long interconnects for delivering the clock from its source to
spine locations are not present and the associated component of CG is zero. However, since
the wireless system contains components with static power dissipation, an equivalent global
capacitance, representing this power dissipation, is needed. This capacitance is then used to
make a comparison to the grid and H-tree systems. To obtain this equivalent capacitance, the
total power dissipation of the clock transmitter and receivers is divided by the factor (V2f).
Table 3.1 shows a breakdown of the global capacitive loading for the three distribution
systems for cases 1 and 2, and includes the initial 0.25-um data. All capacitance units are
in pF. The final row gives the power consumed by the global clock distribution systems as a
percentage of total microprocessor power. This percentage represents the relative amount of
power dissipated in the global system to drive a load of CL. The results show that for both
cases, the wireless system is comparable in performance to the H-tree system and better in
performance than the grid-based system in terms of power dissipation. The results also show
25
that technology developments such as Cu or low-k will have the greatest positive impact on
systems whose total equivalent capacitance is dominated by CW, such as the grid. Finally,
the results show that the power dissipated in the clock receivers, given by CG, should be a
small fraction (2.3%) of the total power dissipated in the microprocessor. These results show
that power dissipation does not impose limitations for wireless clock distribution systems.
However, additional work focused on the overall feasibility and implementation of this system
is ongoing.
3.4 Potential Benefits
The wireless clock distribution system would address the interconnect needs of the
semiconductor industry in providing high-frequency clock signals with short propagation
delays. These needs would be met while providing multiple benefits. First, signal propagation
occurs at the speed of light, shortening the global interconnect delay without requiring
integrated optical components. Second, the global interconnect wires used in conventional
clock distribution systems are eliminated, freeing up these metal layers for other applications.
Third, referring to Figure 3.2, the inter-chip clock distribution system can provide global
clock signals with a small skew to an area much greater than the projected IC size. This is
an additional benefit, possibly allowing synchronization of an entire PC board or a multi-
chip module (MCM). Fourth, in the wireless system, dispersive effects are minimized since
a monotone global clock signal is transmitted. Fifth, another benefit is a more uniformly
distributed power load equalizing temperature gradients across the chip. Sixth, by adjusting
the division ratio in the receiver, higher frequency local clock signals [1] can be obtained, while
maintaining synchronization with a lower frequency system clock. Seventh, an intangible
benefit of wireless interconnect systems is the effect they could have on microprocessor or
system implementations, potentially allowing paradigm shifts such as drastically increased
chip size. Finally, compared to other potential breakthrough interconnect techniques, such
as optical, superconducting, or organic, a wireless approach based on silicon seems to be
26
a potential solution which is compatible with the technology trends of the semiconductor
industry.
3.5 Areas of Research
The main areas of research for wireless clock distribution are as follows: integrating com-
pact power-efficient antenna structures, identifying noise-coupling mechanisms for the wire-
less clock distribution system and estimating the signal-to-noise ratio that can be achieved
on a working microprocessor, implementing the required 20-GHz circuits in a CMOS process
consistent with the ITRS [1], and characterizing a wireless clock distribution system in terms
of skew and power consumption and estimating the overall feasibility of the system.
27
Chapter 4
Optical Clock Distribution
Designing clock distribution networks is a big challenge for future microprocessors due
to increasing frequency, power, transistor counts and process variations. As technology
scales, implementing conventional clock distribution networks that meet low power and skew
requirements is becoming more difficult. On the other hand, optical interconnects are being
proposed as an alternative to electrical interconnects due to their speed-of-light transmission,
high bandwidth and low power dissipation. In the future, interconnects between chips,
between cores on a chip and within components on a processor core could be made optical
to achieve lower power and higher performance.
Optical interconnects for clock distribution were first studied by Goodman et al. [20].
They postulate that interconnect delays will be the limiting factor for performance in future
MOS circuits and suggest moving to optical and electro-optic technologies. With near-
infrared optical sources, modulators and detectors with media such as free space, optical
fibers and integrated optical waveguides, they state the interconnect scaling problem could
be solved.
They list five advantages in moving to photonics: freedom from capacitive loading
effects which allows greater fan-in and fan-out, immunity to mutual interference effects,
lack of planar constraints resulting in reduced cross-coupling for criss-crossing waveguides,
reconfigurability of free space focused interconnects and possibility of direct injection of
optical signals into electronic devices without the need for optical to electrical conversion.
Four ways of optical clocking are described next:
28
• Index-based with waveguides - light is carried from a single source generating the optical
clock signal to the other parts of the chip using waveguides which are integrated on a
suitable substrate.
• Index-based with fiber optics - light is transmitted similar to the above case except
that fibers are used instead of waveguides in a separate core.
• Unfocused free space interconnect - optical clock signal broadcast to the entire chip by
focusing light (through a lens or diffusers) perpendicular to the chip from above.
• Focused free space interconnect - an optical element like a hologram (which acts as a
grating) sends the optical signals onto a multitude of detection sites simultaneously.
Miller et al [31, 32] discuss various opportunities for optics in on-chip interconnects
and notes that one of the main reasons to move away from long electrical on-chip wires
is to avoid growing transmission line and inductance effects. Power requirements of long
point-to-point optical interconnects are much lower (with no need for repeaters for optical
clock transmission), and optical signals, whether long or short, perform equally well. Other
benefits include the ability to send signals across in the 3rd dimension and the possibility
of integration with electronic devices. Debaes et al. [13] and Bhatnagar [7] discuss the
advantages of optical clocking and propose a receiver-less optical clocking scheme which
reduces the latency of transmitting the clock signal to the local network. They conclude
that such a latency-reducing scheme reduces skew and jitter but does not reduce the power
consumed compared to an electrical clocking scheme.
Recent studies comparing electrical and optical clocking schemes have analysed the
power, skew and area usage for different technologies and estimated the potential of photonics
in clock distribution [11, 25]. Based on the 2001 ITRS roadmap [2], using analytical models,
they estimate that most of the power dissipation is associated with local clock distribution.
Since alternatives to electrical clock distribution have been proposed for replacing the global
clock distribution only, they conclude that there are no significant power benefits to replacing
29
electrical distribution with optical distribution. The paper also shows that low skew can be
obtained with optical as well as non-scaled electrical interconnects (130nm) and concludes
that skew benefits of optical transmission are also not significant if non-scaled electrical
interconnects are used. They consider a global balanced and buffered H-Tree driving local
grids as representative of current clock distribution techniques. For optical distribution, they
replace the electrical H-tree by an H-tree structure built using waveguides with detectors at
the end points to convert the light pulses to a clock signal. The local grids and buffers
driving the local grids remain the same in both implementations.
Mule et al. [34] discuss the pros and cons of electrical and optical clock distribution
systems. Among the different schemes used to transmit the optical clock onto different
parts of the chip, they find the waveguide based approach most feasible since free-space
approaches work in 3 dimensions (and hence are not compact) and this would complicate
power distribution and cooling. To accomplish local clock routing with optical technology,
the fanout should correspond to the total number of latches on the chip. Since optical
signals do not use repeaters and rely on the original source strength to drive all the loads,
it is extremely difficult to make the fanout more than a few tens or hundreds of loads at
the most. Hence for the regions which require short wires like the local distribution regions,
current systems can only use electrical routing.
A possible implementation of optical clocking is to replace the global transmission net-
work with optical waveguides and use electrical local distribution. At the end of the global
transmission network, when the global clock signal is converted to electrical signals using a
detector, it has to be buffered and amplified and sent to the local network. However, given
expected optical technologies, the number of fanouts on the global optical transmission must
be relatively small (less than 100). This means that the electrical fanout driven by each op-
tical receiver will be quite large (thousands of latches). In order to make the optical receiver
fast and to support large optical fanouts, each optical receiver must be physically small and
have a very small capacitance. However, taking a signal on a very small capacitance and
30
buffering it to drive a very large capacitance is a classical logical effort problem [45]. This
will require many stages of buffering, which introduces additional skew due to different pro-
cess, thermal, and voltage conditions between different local network buffers. In contrast,
a similar sized local distribution network driven from an electrical input will be connected
to a global electrical transmission spine with very large capacitance due to wire parasitics.
Thus the additional delay caused by connecting large electrical buffers to the global spine
is modest. Hence a smaller number of buffer stages are needed to drive a local clock dis-
tribution network given an electrical input in comparison to an optical input. This could
give local distribution buffers with electrical inputs lower skew than local distribution buffers
with optical inputs.
Because of the extremely large fanout of the clock, high capacitive loads, and extremely
low effective load impedance, the clock distribution problem is primarily a problem in power
amplification. Electrical power amplification technologies are more power and skew efficient
than expected optical technologies. For example, since the CMOS buffers act as non-linear
amplifiers, their power added efficiency can easily be over 60%.
In this section we consider several possible approaches to using optics for distribution
of the clock, with the aim of minimizing or eliminating clock skew. Attention is focused on
the problem of distributing the clock within a single chip.
4.1 Intra-Chip Clock Distribution
The interconnections responsible for clock distribution are characterized by the facts
that they must convey signals to all parts of the chip and to many different devices. These
requirements imply long interconnect paths and high capacitive loading. Hence the propa-
gation delays are large and depend on the particular configuration of devices on the chip.
Here we consider methods for using optics to send the clock to various parts of the chip.
It is assumed that optics is used in conjunction with electronic interconnects, in the sense
that optical signals might be used to carry the clock to various major sites on the chip, from
31
which the signals would be further distributed, on a local basis, by a conventional electronic
interconnection system.
The clocks used in MOS technology are generally two phase [29]. Presumably only one
of these phases will be distributed optically, the other being generated on the chip after the
detection of the optical timing signal.
A variety of optical techniques can be envisioned for accomplishing the task at hand.
The main distinction between these approaches occurs in the method used to convey light
to the desired locations on the chip.
4.1.1 Index-Guided Optical Interconnects
The first major category of optical interconnect techniques is refer to as “index guided”.
Light is assumed to be carried from some single source generating an optical signal modulated
by the clock to many other sites by means of waveguides. The waveguides could be of either
of two types. One type could use optical fibers for carrying the optical signals. The second
type could use optical waveguides integrated on a suitable substrate.
If fibers are chosen as the interconnect technology, then the following approach, illus-
trated in Figure 4.1, might be used. A bundle of fibers is fused together at one end, yielding
a core into which light from the modulated optical source (probably a lasing diode) must be
coupled. Light coupled in at the fused end is split as the cores separate, and transmitted to
the ends of each of the fibers in the bundle. Each fiber end must now be carefully located
over an optical detector that will convert the optical clock to an electrical one. Alignment
of the fiber and the detector might be accomplished with the help of micropositioners (anal-
ogous to a wire bonding machine), and UV-hardened epoxy could be used to hold the fiber
in its proper place permanently. The difficulties associated with the fiber-optic approach
stem from the alignment requirements for the fibers and detectors, and from the uniformity
requirements for the fused-fiber splitter. It should also be noted that the fibers cannot be
allowed to bend too much, for bends will cause radiation losses that may become severe.
32
Figure 4.1: Distribution of the clock by means of fiber.
Lastly, we should mention that the use of fibers, and the requirements regarding allowable
degrees of bending, imply that this interconnect technology will occupy a three-dimensional
volume, rather than being purely planar, and this property could be a disadvantage in some
applications.
If integrated optical waveguides are chosen as the interconnect technology, then the
geometry might be that shown in Figure 4.2. The waveguides might be formed by sputtering
of glass onto a silicon dioxide film on the Si substrate. These guides are shown as straight
in the figure, a configuration chosen again because of the large losses anticipated if this
type of light guide is bent at a large angle. Optical signals must be coupled into each of
the separate guides. Such signals might be generated by a single laser diode and carried
to the waveguides by fibers, or separate sources might drive each of the guides, with the
clock distributed to the different sources electrically. Presumably light must be coupled
out of each of the straight waveguides at several sites along its length, with a detector
converting the optical signal to electronic form at each such site. The difficulties associated
with the waveguide approach to the problem, neglecting the bending problem which has been
intentionally avoided, stem primarily from the requirement to efficiently couple into and out
of the guides. Careful alignment of the sources or fibers with the integrated waveguides is
required, and couplers with short lengths are desired to remove the light from the guides and
place it onto the appropriate detectors. Present waveguide technology requires distributed
33
Figure 4.2: Distribution of the clock by means of integrated optical waveguides.
couplers with rather large dimensions (5 um x 1 mm) compared with the feature sizes
normally thought of in electronic IC technology. A major advantage of the integrated optics
distribution system lies in its planar character and the small excess volume it requires. A
disadvantage is the comparatively inflexible geometry dictated by the necessity to avoid large
bends of the waveguides.
4.1.2 Free-Space Optical Interconnects
A second major category of optical interconnects can be referred to as “free-space”
techniques. For such interconnects, the light is not guided to its destination by refractive
index discontinuities, but rather by the laws that govern the propagation of light in free
space. It is helpful to distinguish between two types of free-space interconnect techniques,
“unfocused” and “focused”.
Unfocused interconnections are established simply by broadcasting the optical signals
carrying the clock to the entire electronic chip. One such approach is shown in Figure 4.3. A
modulated optical source is situated at a focal point of a lens that resides above the chip. The
signal transmitted by that source is collimated by the lens, and illuminates the entire chip at
normal incidence. Detectors integrated in the chip receive the optical signals with identical
delays, due to the particular location of the source at the focal point of the lens. Hence in
principle there is no clock skew whatever associated with such a broadcast system. However,
the system is very inefficient, for only a small fraction of the optical energy falls on the
34
Figure 4.3: Unfocused broadcast of the clock to the chip.
photosensitive areas of the detectors, and the rest is wasted. Inefficient use of optical energy
may result in requirements for the provision of extra amplification of the detected clock
signals on the chip, and a concomitant loss of area for realizing the other electronic circuitry
required for the functioning of the chip. Moreover, the optical energy falling on areas of the
chip where it is not wanted may induce stray electronic signals that interfere with the proper
operation of the chip. Therefore, it is likely that an opaque dielectric blocking layer would
be needed on the chip to prevent coupling of optical signals at places where they are not
wanted. Openings in this blocking layer would be provided to allow the optical signals to
reach the detectors. Alternate unfocused interconnection techniques could be imagined that
use diffusers rather than a lens. Note that all such techniques require a three-dimensional
volume in order to transport the signals to the desired locations.
The last category of optical interconnections is free-space “focused” interconnections,
which can also be called “imaging” interconnections. For such interconnections, the optical
source is actually imaged by an optical element onto a multitude of detection sites simulta-
neously. As indicated in Figure 4.4, the required optical element can be realized by means
of a hologram, which acts as a complex grating and lens to generate focused grating com-
ponents at the desired locations. The efficiency of such a scheme can obviously exceed that
35
Figure 4.4: Focused optical distribution of the clock using a holographic optical element.
of the unfocused case, provided the holographic optical elements have suitable efficiency.
Using dichromatic gelatin as a recording material, efficiencies in excess of 99 percent can
be achieved for a simple sine wave grating. When a multitude of focused spots are to be
produced, the efficiency will presumably be lower, but should be well in excess of 50 percent.
The flexibility of the method is great, for nearly any desired configuration of connections
can be realized.
The chief disadvantage of the focused interconnect technique is the very high degree of
alignment precision that must be established and maintained to assure that the focused spots
are striking the appropriate places on the chip. Of course, the spots might be intentionally
defocused, decreasing the efficiency of the system, but easing the alignment requirements.
Thus there exists a continuum of compromises between efficiency and alignment difficulty.
Figure 4.5 illustrates a possible configuration that retains high efficiency but minimizes
alignment problems. The imaging operation is provided by two two-element lenses, in the
form of a block with a gap between the elements. A Fourier hologram can be inserted
between the lenses, and it establishes the desired set of focused spots. The hologram itself
consists of a series of simple sinusoidal gratings, and as such the position of the diffracted
spots is invariant under simple translations of the hologram. The source is permanently fixed
36
Figure 4.5: Configuration for focused clock distribution that minimizes alignment problems.
on the top of the upper lens block after it has been aligned with a detector at the edge of the
chip, thereby establishing a fixed optical axis. The only alignment required for the hologram
is rotation. The position of the image spots is determined by the spatial frequencies of the
gratings in the hologram, which could be established very precisely if the hologram were
written, for example, by electron-beam lithography.
Focused interconnect systems, like the unfocused ones, require a three-dimensional vol-
ume above the chip. If holographic elements are used, thought must be given to the effects of
using a comparatively non-monochromatic source such as an LED. A spread of the spectrum
of the source results in a spread of the focused energy, so the primary effect is to reduce the
efficiency with which light can be delivered to the desired detector locations.
4.2 Essential Components of an On-Chip Optical Signaling System
Electrical clock distribution networks utilize geometrical or electrical matching or grid
routing to minimize clock skew [18, 39]. To minimize clock delay, large, and power hungry
clock buffers are inserted in the clock distribution network. These buffers can consume up
to 30% - 40% of the total chip power [27]. Most of the existing optical clocking solutions
involve using non-CMOS compatible exotic materials or processing steps which are expensive
to integrate into existing manufacturing flows, preventing their wide spread adoption [34].
37
Figure 4.6: An optical clock distribution system.
A truly CMOS compatible solution will reduce manufacturing cost and facilitate easier inte-
gration into main stream manufacturing. Such an approach was introduced in [38, 40, 53].
Figure 4.6 shows the various stages in an optical clocking system which is compatible with
the proposed approach in [40] and [38]. The optical clock source is optically coupled to the
distribution network, which is optoelectronically coupled to an optical detector that converts
incident optical energy into current pulses. The recovery and signal condition stage then am-
plifies the current pulses to generate a corresponding rail-to-rail electrical clock signal for
local distribution. The clock signal is distributed to the entire chip by dividing the chip
into clock domains and placing a clock recovery resource or transimpedance amplifier (TIA)
station in each domain.
4.2.1 Optical Clock Source
The optical source providing the optical clock is a laser diode that is external to the
chip. The laser diode is attached to a single mode optical fiber. The optical energy at the
end of this fiber is the optical input to the clock distribution network. The optical coupling
into the clock distribution network is achieved by positioning and gluing the fiber to the
polished edge of the die.
38
Figure 4.7: Line diagram of an optical H-tree.
4.2.2 Optical Clock Distribution Tree
On-chip waveguides can be used to construct a planar optical H-tree. A 16 leaf-node,
balanced optical H-tree (as shown in Figure 4.7) is used as the optical clock distribution
network in the on-chip optical clock distribution and recovery system.
Seamless integration of the optical distribution network into standard CMOS processing
is of paramount importance, especially since the distribution network is spatially the largest
component of the system and where the advantages of optical signaling is most pronounced.
The planar waveguide core is constructed from silicon nitride (SiN), which is normally used
for copper encapsulation in a standard CMOS process. The cladding layers are made of
silicon dioxide (SiO2) and low-k oxides such as phospho-silicate glass (PSG) and tetra ethyl
ortho silane (TEOS), normally used as inter-metal dielectric material in standard CMOS
processes. The silicon nitride is deposited using a plasma-enhanced chemical vapor deposi-
tion (PECVD) technique on to a PSG/TEOS/SiO2 sandwich, which serves as the bottom
cladding. The top cladding is a TEOS/SiO2 layer. The high index difference between the
waveguide core and cladding provides excellent optical confinement in the waveguide. To
improve optical coupling between the H-tree and the external optical source, i.e., the optic
fiber, the waveguide core is wider at the die edge and slowly tapered down to the detectors.
39
Figure 4.8: Traveling wave photo detector.
4.2.3 Optical Receiver: Photo Detector
The electrical clock is recovered from the optical clock by using a photo detector which
is placed at the end of each H-tree leaf node. In addition to being truly CMOS compatible,
the photo detector must have high bandwidth and photo sensitivity. Conventional photo
detectors such as P-I-N photodiodes have current “tails” (i.e., long temporal response) in
their time domain impulse response, effectively limiting incident optical energy switching
speed [22]. This behavior is attributed to the thickness of the photo detector and the greater
distance between the electrodes and where the carriers are generated within the detector.
Conventional photo detector designs therefore are insufficient for optical clock distribution
purposes where clock frequencies are high and constantly increasing. A more suitable photo
detector should have high speed and fast impulse response with minimal photo current tail.
A thin polysilicon layer, when used as the detector material, generates carriers in the
high field region with minimal photo current tail. However, a thin polysilicon layer results
in low detector responsivity. Thus, a traveling wave or lateral incidence detector (as shown
in Figure 4.8 [53]), which greatly enhances photo detector responsivity without the need
for thick polysilicon layer, is used. The thin polysilicon layer ensures that the carriers are
generated close to the contacts (electrodes), greatly improving the response time of the photo
detector. Furthermore, for a lateral incidence photo detector with a thin polysilicon layer,
40
Figure 4.9: General approach to optical global clock distribution network.
the photocurrent has lower dependence on the bias voltage as the carriers are generated close
to the contacts and are less likely to undergo recombination.
The lateral incident photo detector, which is placed at the leaf-node of the optical H-
tree, when properly biased generates a current pulse for every incident laser pulse. The
photo detector initiates the clock recovery process by converting the incident optical clock
to current pulses (photo current) which are then processed to generate electrical clock signals
for local distribution. Due to the low magnitude of the photo current, it cannot be used
directly to drive clock sink nodes. A TIA is used to convert, amplify and condition the photo
current to generate rail-to-rail electrical clock signal.
4.3 Comparison of Power Consumption with Electrical Distribution System
Grzegorz et al [49] compare the power consumption of optical and conventional electrical
clock distribution systems. Figure 4.9 shows a low-power vertical cavity surface emitting
laser (VCSEL) used as an off-chip photonic source. The VCSEL is coupled to the H-tree
symmetrical passive waveguide structure and provides the clock signal to N optical receivers.
The number and placement of the receivers in an optical clock system is equivalent to the
41
Figure 4.10: Power consumption of optical and electrical clock distribution networks versusthe number of H-tree nodes.
number and placement of the output nodes in the electrical H-tree. At the receivers, the
high speed optical signal is converted to an electrical signal and subsequently distributed by
the local electrical networks. The number of optical to electrical converters is a particularly
crucial parameter in the overall system since optoelectronic interface circuits at these points
are, ofcourse, necessary and consume power. The methodology and the assumptions used to
properly design the optical H-tree are presented in detail in [47, 48].
First, the power consumption of both systems is compared. The comparison is based
on the ITRS technology roadmap [3] in the case of electrical clock systems and on the state-
of-the-art device parameters in the case of optical clocks. This assumption may result in a
pessimistic estimation of the performance of the optical clock distribution network, for future
technology nodes. The results presented in Figure 4.10 show the comparison between power
consumption of electrical and optical clock systems, both designed for the 70 nm technology
node. For a small number of H-tree nodes the power consumed by the optical H-tree is
more than one order of magnitude lower than in the electrical one. However, along with the
growth of circuit complexity, the advantage of the optical system tends to decrease. Finally,
with 8172 output nodes in the considered case, the power consumed by the optical system
becomes higher than that consumed by the electrical one. This fact can be easily explained
taking into consideration the optical power budget [47, 48]. Along with doubling the number
42
Figure 4.11: Chip structure with temperature gradient.
of H-tree output nodes, the optical power, which needs to be emitted by the VCSEL to meet
the overall system quality increases at least by 3.2 dB, which in turn increases the electrical
power consumed by VCSEL by more than 100%. Additionally, since the number of receivers
is equal to the H-tree output nodes, the power consumed by receivers also doubles. In the
case of an electrical system the power consumption increases much less rapidly. These results
clearly show that the advantages of optics drastically decrease with the number of output
nodes.
In the next step the clock skew of the optical system is calculated. There are several
sources of clock skew in optical system. Apart from process parameter variations, which are
mainly the tolerance of device and waveguide physical parameters, system level fluctuations
like temperature variations have to be considered. In their analysis only the impact of
temperature variation on optical signal speed have been taken into account. Along with the
growth of chip temperature, the refractive index of waveguide core increases, thus reducing
the speed of clock signal. Typical temperature gradients over the entire chip presented in the
literature are less than 50 K [37]. The calculation has been performed for the chip structure
where the temperature of one part is lower (350 K), while that of the other part is higher
(400 K) as presented in Figure 4.11. This represents the worst-case scenario.
43
Figure 4.12: Clock skew of a 64-output-node optical H-tree compared to the clock period asa function of technology.
Figure 4.12 shows the clock skew of a 64-output-node optical H-tree compared to the
clock period as a function of technology. It is clear from this figure that just for the 32 nm
technology node (33 GHz) the clock skew is higher than 10% of the clock period. This will
result in a serious system failure.
4.4 Conclusion
The use of optics to make connections within and between chips could solve many of
the problems experienced in current electrical systems. Many of the physical reasons for the
use of optics are well understood and indicate many potential quantitative and qualitative
benefits. Though there are, and will continue to be, electrical solutions that stretch the
capabilities of electrical interconnects, optics is arguably the only physical solution available
to solve the underlying problems of interconnects, and has the potential to continue to scale
with future generations of silicon integrated circuits.
44
Chapter 5
Future Work
5.1 Wireless Clock Distribution
• External power amplifiers (PAs) are used to increase the power level of the transmitted
clock signal because the on-chip PA is mis-tuned. External PAs increase the system
complexity and increase the cost. They should be replaced by an on-chip PA. In this
system, the transmitted global clock signal is an amplitude-modulated sine wave with
periodic no-signal-transmission, which is generated by switches at output nodes. A PA
with high output power and power efficiency should be designed and fabricated.
• The local clock signal frequency is limited by the operating frequency of clock trans-
mitter and receiver. There is much room for increasing the clock frequency using this
technology. In an inter-chip wireless clock distribution system, microwaves propagate
at the speed of light to distribute the clock signals across a chip. The RC-delay and
dispersion associated with conventional metal-line interconnections are almost elimi-
nated. Using this technology, the maximum local clock signal frequency is set by the
maximum operating frequency of clock transmitter and clock receiver, and the clock
skew and jitter at really high frequencies. According to recent publications, CMOS ICs
using a UMC 0.13um process can reach an operating frequency of ∼100GHz [10, 24].
Therefore, the maximum clock frequency will be limited only by the clock skew and
jitter. As discussed earlier, the skew and jitter performances have the potential of
being excellent, which could give more margin for the clock frequency increase. To
further reduce the clock skew and jitter, the free-running VCO in the clock trans-
mitter should be replaced by a PLL, and a clock receiver with lower noise should be
45
developed. With the increase of clock frequency, the on-chip receiving antenna size
can be reduced, which will reduce the on-chip receiving antenna area, a major problem
in the system. As a result of the smaller receiving antenna size, the width and height
of rectangular apertures in heat sinks can be reduced, which should improve the heat
sink performance.
5.2 Optical Clock Distribution
• Optoelectronic devices require continued development to meet the yield, tolerance,
and drive voltage requirements for practical systems with future generations of silicon
CMOS.
• Work will be required in the interface circuits between optics and electronics. Though
there appears to be no current fundamental difficulty in making such circuits in CMOS,
research is needed in circuits that, i) avoid issues such as crosstalk and susceptibility
to digital noise, ii) have appropriately low power dissipation and latency, and iii) are
tolerant to process variations.
• The technology for integrating optoelectronics with silicon integrated circuits is still
at an early stage, though there have been key demonstrations of substantial working
integrations. Likely first introductions of optical interconnect to chips will use hybrid
approaches, such as solder bonding; such hybrid approaches require no modifications
to the current process for fabricating silicon integrated circuits except to add processes
to fabricated silicon integrated circuit wafers.
• It will be important to research the systems and architectural benefits of optics for
interconnects. Optics can likely enable kinds of architectures that are not well suited
to electrical interconnect systems (e.g., architectures with many long connections, ar-
chitectures with large “aspect ratios,” architectures requiring synchronous operation
over large domains), and can likely also allow continued use of current architectures
46
that otherwise would have to be abandoned in the future because of the limitations of
wired interconnects.
47
Bibliography
[1] “The International Technology Roadmap for Semiconductors,” 1999. Semiconductor IndustriesAssociation, San Jose, California.
[2] “The International Technology Roadmap for Semiconductors,” 2001.
[3] “The International Technology Roadmap for Semiconductors,” 2005.
[4] H. B. Bakoglu, Circuits, Interconnections, and Packaging for VLSI. New York: AddisonWesley, 1990.
[5] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, “3-D ICs: A Novel Chip Design forImproving Deep-Submicrometer Interconnect Performance and Systems-on-Chip Integration,”Proc. IEEE, vol. 89, pp. 602–633, 2001.
[6] L. Benini and G. De Micheli, “Transformation and Synthesis of FSMs for Low-Power GatedImplementation,” in Proceedings International Symposium on Low Power Design, Apr. 1995,pp. 21–26.
[7] A. Bhatnagar, Low Jitter Clocking of CMOS Electronics Using Mode-Locked Lasers. PhDthesis, Stanford University, Palo Alto, California, 2005.
[8] W. J. Bowhill, R. L. Allmon, S. L. Bell, E. M. Cooper, D. R. Donchin, J. H. Edmondson,T. C. Fischer, P. E. Gronowski, A. K. Jain, P. L. Kroesen, B. J. Loughlin, R. P. Preston,P. I. Rubinfeld, M. J. Smith, S. C. Thierauf, and G. M. Wolrich, “A 300-MHz 64-b Quad-Issue CMOS RISC Microprocessor,” IEEE Jour. Solid-State Circuits, vol. 30, no. 11, pp.1203–1211, Nov. 1995.
[9] D. Bravo, Investigation of Background Noise in Integrated Circuits Relating to the Design ofan On-Chip Wireless Interconnect System Master’s thesis, University of Florida, Gainesville,Florida, 2000.
[10] C. Cao and K. K. O, “A 90-GHz Voltage-Controlled Oscillator with a 2.2-GHz Tuning Rangein a 130-nm CMOS Technology,” in Symp. VLSI Circuits Dig. Tech. Papers, (Kyoto, Japan),2005.
[11] K.-N. Chen, M. J. Kobrinsky, B. C. Barnett, and R. Reif, “Comparisons of Conventional,3-D, Optical, and RF Interconnects for On-Chip Clock Distribution,” IEEE Transactions onElectron Devices, vol. 51, no. 2, Feb. 2004.
[12] T.-Y. Chiang, K. Banerjee, and K. C. Saraswat, “Effect of Via Separation and Low-k DielectricMaterials on the Thermal Characteristics of Cu Interconnects,” in Proc. IEEE Electron DeviceMeeting, 2000, pp. 261–264.
[13] C. Debaes, A. Bhatnagar, D. Agarwal, R. Chen, G. A. Keeler, N. C. Helman, H. Thienpont,and D. A. B. Miller, “Receiver-Less Optical Clock Injection for Clock Distribution Networks,”IEEE Journal of Selected Topics in Quantum Electronics, vol. 9, no. 2, March/April 2003.
48
[14] A. Deutsch, H. Harrer, C. W. Surovic, G. Hellner, D. C. Edelstein, R. D. Goldblatt, G. A.Biery, N. A. Greco, D. M. Foster, E. Crabbe, L. T. Su, and P. W. Coteus, “FunctionalHigh-Speed Characterization and Modeling of a Six-Layer Copper Wiring Structure and Per-formance Comparison With Aluminum On-Chip Interconnections,” in IEEE Electron DeviceMeeting Tech. Digest, Dec. 1998, pp. 295–298.
[15] D. W. Dobberpuhl, R. T. Witek, R. Allmon, R. Anglin, D. Bertucci, S. Britton, L. Chao, R. A.Conrad, D. E. Dever, B. Gieseke, S. M. N. Hassoun, W. Hoeppner, K. Kuchler, M. Ladd, B. M.Leary, L. Madden, E. J. McLellan, D. R. Meyer, J. Montanaro, D. A. Priore, V. Rajagopalan,S. Samudrala, and S. Santhanam, “A 200-MHz 64-b Dual Issue CMOS Microprocessor,” IEEEJour. Solid-State Circuits, vol. SC-27, no. 11, pp. 1555–1565, Nov. 1992.
[16] B. Floyd, C.-M. Hung, and K. O. Kenneth, “Intra-Chip Wireless Interconnect for Clock Dis-tribution Implemented With Integrated Antennas, Receivers, and Transmitters,” IEEE Jour.Solid-State Circuits, vol. 37, pp. 543–552, 2002.
[17] B. A. Floyd and K. O. Kenneth, “The Projected Power Consumption of a Wireless ClockDistribution System and Comparison to Conventional Distribution Systems,” in Proc. IITC,1999.
[18] E. Friedman, Clock Distribution Networks in VLSI Circuits and Systems. New York: IEEEPress, 1995.
[19] B. A. Gieseke, R. L. Allmon, D. W. Bailey, B. J. Benschneider, S. M. Britton, J. D. Clouser,H. R. F. III, J. A. Farrell, M. K. Gowan, C. L. Houghton, J. B. Keller, T. H. Lee, D. L.Leibholz, S. C. Lowell, M. D. Matson, R. J. Matthew, V. Peng, M. D. Quinn, D. A. Priore,M. J. Smith, and K. E. Wilcox, “A 600-MHz Superscalar RISC Microprocessor with Out-Of-Order Execution,” in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 176–177.
[20] J. W. Goodman, F. Leonberger, S.-Y. Kung, and R. A. Athale, “Optical Interconnections forVLSI Systems,” Proceedings of the IEEE, vol. 72, no. 7, pp. 850–866, July 1984.
[21] X. Guo, D. Yang, R. Li, and K. K. O, “A Receiver with Start-up Initialization and Pro-grammable Delays for Wireless Clock Distribution,” in IEEE Int. Solid-State Circuits Conf.Digest of Tech. Papers, 2006, pp. 386–387.
[22] U. Hilleringman and K. Goser, “Optoelectronic System Integration on Silicon: Waveguides,Photodetectors, and VLSI CMOS Circuits on One Chip,” IEEE Trans. Electron Devices,vol. 42, no. 5, pp. 841–846, May 1995.
[23] M. Horowitz, “Clocking Strategies in High Performance Processors,” in Proceedings of theIEEE Symposium on VLSI Circuits, June 1992, pp. 50–53.
[24] P. Huang, M. Tsai, H. Wang, C. Chen, and C. Chang, “A 114GHz VCO in 0.13um CMOSTechnology,” in Proc. ISSCC, 2005. Paper 21.8.
[25] M. J. Kobrinsky, B. A. Block, J. F. Zheng, B. C. Barnett, E. Mohammed, M. Reshotko,F. Robertson, S. List, I. Young, and K. Cadien, “On-Chip Optical Interconnects,” Intel Tech-nology Journal, vol. 8, no. 2, May 2004.
[26] H. Kojima, S. Tanaka, and K. Sasaki, “Half-Swing Clocking Scheme for 75% Power Saving inClocking Circuitry,” in Proceedings of the IEEE Symposium on VLSI Circuits, June 1994, pp.23–24.
[27] J. Lillis, C.-K. Cheng, and T. Lin, “Optimal and Efficient Buffer Insertion and Wire Sizing,”in Proc. IEEE Custom Integr. Circuits Conf., May 1995, pp. 259–262.
49
[28] S. List, C. Webb, and S. Kim, “3D Wafer Stacking Technology,” in Proc. Advanced Metalliza-tion Conf., Oct. 2002, pp. 29–36.
[29] C. Mead and L. Conway, Introduction to VLSI Systems. Reading, Massachusetts: Addison-Wesley, 1980.
[30] J. Mehta, An Investigation of Background Noise in ICs and Its Impact on Wireless ClockDistribution Master’s thesis, University of Florida, Gainesville, Florida, 1998.
[31] D. Miller, A. Bhatnagar, S. Palermo, A. Emami-Neyestanak, and M. Horowitz, “Opportunitiesfor Optics in Integrated Circuits Applications,” in Proceedings of the International Solid StateCircuits Conference, 2005.
[32] D. A. B. Miller, “Rationale and Challenges for Optical Interconnects to Electronic Chip,”Proc. IEEE, vol. 88, pp. 728–749, 2000.
[33] J. Montanaro, R. T. Witek, K. Anne, A. J. Black, E. M. Cooper, D. W. Dobberpuhl, P. M.Donahue, J. Eno, W. Hoeppner, D. Kruckemyer, T. H. Lee, P. C. M. Lin, L. Madden, D. Mur-ray, M. H. Pearce, S. Santhanam, K. J. Snyder, R. Stehpany, and S. C. Thierauf, “A 160-MHz,32-b, 0.5-W CMOS RISC Microprocessor,” IEEE Jour. Solid-State Circuits, vol. 31, no. 11,pp. 1703–1714, Nov. 1996.
[34] A. Mule, E. Glytsis, T. Gaylord, and J. Meindl, “Electrical and Optical Clock DistributionNetworks for Gigascale Microprocessors,” IEEE Trans. Very Large Scale Integration (VLSI)System, vol. 10, no. 5, pp. 582–594, Oct. 2002.
[35] C. Nagendra and M. J. Irwin, “Design Trade-Offs in CMOS FIR Filters,” in Proc. IEEEInternational Conference on Acoustics, Speech, and Signal Processing Conference, volume 6,May 1996.
[36] J. Neves and E. Friedman, “Circuit Synthesis of Clock Distribution Networks Based on Non-Zero Clock Skew,” in Proceedings of IEEE International Symposium on Circuits and Systems,May/June 1994, pp. 4.175–4.178.
[37] F. J. Pollack, “New microarchitecture challenges in the coming generations of CMOS processtechnologies,” in Proc. 32nd Ann. ACM/IEEE Int. Symp. Microarchitecture, (Haifa, Israel),1999, pp. 2–4.
[38] R. Pownall, G. Yuan, T. Chen, P. Nikkel, and K. Lear, “Geometry dependence of cmos-compatible, polysilicon, leaky-mode photodetectors,” IEEE Photon. Technol. Lett., vol. 19,no. 7, pp. 513–515, Apr. 2007.
[39] P. Ramanathan, A. Dupont, and K. Shin, “Clock Distribution in General VLSI Circuits,”IEEE Trans. Circuits Syst. I: Fundam. Theory Appl., vol. 41, no. 5, pp. 395–404.
[40] A. Raza, G. W. Yuan, C. Thangaraj, T. Chen, and K. Lear, “Waveguide Coupled CMOSPhotodetector for On-Chip Optical Interconnects,” Lasers Electro-Optics Soc., vol. 1, pp.152–153, Nov. 2004.
[41] R. Reif, A. Fan, K.-N. Chen, and S. Das, “Fabrication Technologies for Three-DimensionalIntegrated Circuits,” in Proc. IEEE International Symp. Quality Electronic Design, 2002, pp.33–37.
[42] P. J. Restle and A. Deutsch, “Designing the Best Clock Distribution Network,” in Proc. Symp.VLSI Circuits, 1998, pp. 2–5.
50
[43] W. Ryu, J. Lee, H. Kim, S. Ahn, N. Kim, B. Choi, D. Kam, and J. Kim, “RF Interconnectfor Multi-Gbit/s Board-Level Clock Distribution,” IEEE Trans. on Adv. Packaging, vol. 23,pp. 398–407, 2000.
[44] A. Z. Shang and F. A. P. Tooley, “Digital Optical Interconnects for Networks and ComputingSystems,” Lightwave Technology, vol. 18, pp. 2086–2094, 2000.
[45] I. Sutherland, B. Sproul, and D. Harris, Logical Effort: Designing Fast CMOS Circuits.Morgan-Kaufmann, 1999.
[46] S. Tam, S. Rusu, U. N. Desai, R. Kim, J. Zhang, and I. Young, “Clock Generatition andDistribution for the First IA-64 Microprocessor,” IEEE Jour. Solid-State Circuits, vol. 35, pp.1545–1552, 2000.
[47] G. Tosik, F. Gaffiot, Z. Lisik, I. O’Connor, and F. Tissafi-Drissi, “Optical Versus ElectricalInterconnections for Clock Distribution Networks in New VLSI Technologies,” in Proc. Inte-grated Circuit and System Design, Power and Timing Modeling, Optimization and Simulation,volume 2799, Sept. 2003, pp. 461–470.
[48] G. Tosik, F. Gaffiot, Z. Lisik, I. O’Connor, and F. Tissafi-Drissi, “Metallic Clock DistributionNetworks in New VLSI Technologies,” IEE Electron. Lett., vol. 40, no. 3, Feb. 2004.
[49] G. Tosik, Z. Lisik, and F. Gaffiot, “Optical Interconnections in future VLSI systems,” Journalof Telecommunications and Information Technology, pp. 105–108, 2007.
[50] S. H. Unger and C.-J. Tan, “Clocking Schemes for High-Speed Digital Systems,” IEEE Trans.Computers, vol. C-35, no. 10, pp. 880–895, Oct. 1986.
[51] D. Wann and M. Franklin, “Asynchronous and Clocked Control Structures for VLSI BasedInterconnection Networks,” IEEE Trans. Computers, vol. C-32, no. 3, pp. 284–293, Mar. 1983.
[52] C. Webb, C. J. Anderson, L. Sigal, K. L. Shepard, J. S. Liptay, J. D. Warnock, B. Curran,B. W. Krumm, M. D. Mayo, P. J. Camporese, E. M. Schwarz, M. S. Farrell, P. J. Restle,R. M. A. III, T. J. Slegel, W. V. Huott, Y. H. Chan, B. Wile, T. N. Nguyen, P. G. Emma,D. K. Beece, C.-T. Chuang, and C. Price, “A 400-MHz S/390 Microprocessor,” IEEE J. SolidState Circuits, vol. 32, no. 11, pp. 1665–1675, Nov. 1997.
[53] G. Yuan, R. Pownall, P. Nikkel, C. Thangaraj, T. Chen, , and K. Lear, “Characterization ofCMOS Compatible Waveguide-Coupled Leaky-Mode Photodetectors,” IEEE Photon. Technol.Lett., vol. 18, no. 15, pp. 1657–1659, Aug. 2006.
[54] Q. Zhu and W. Dai, “Planar Clock Routing for High Performance Chip and Package Co-Design,” IEEE Transactions on VLSI Systems, vol. 4, no. 2, pp. 210–226, June 1996.