1 Design and Design and Impementation of a Impementation of a Sub-threshold BFSK Sub-threshold BFSK Transmitter Transmitter By: Suganth Paul # Rajesh Garg $ Sunil P. Khatri $ Sheila Vaidya % # Intel Corporation, Austin, TX $ Department of ECE, Texas A&M University, College Station, TX % Lawrence Livermore National Lab., Livermore, CA
43
Embed
Design and Impementation of a Sub-threshold BFSK Transmitter
Design and Impementation of a Sub-threshold BFSK Transmitter. By: Suganth Paul # Rajesh Garg $ Sunil P. Khatri $ Sheila Vaidya % # Intel Corporation, Austin, TX $ Department of ECE, Texas A&M University, College Station, TX % Lawrence Livermore National Lab., Livermore, CA. Outline. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Design and Design and Impementation of a Impementation of a Sub-threshold BFSK Sub-threshold BFSK
TransmitterTransmitterBy:
Suganth Paul#
Rajesh Garg$
Sunil P. Khatri$
Sheila Vaidya%
#Intel Corporation, Austin, TX$Department of ECE, Texas A&M University, College Station, TX
%Lawrence Livermore National Lab., Livermore, CA
2
OutlineOutline
Sub-threshold circuits – the opportunity Challenges
Process/temperature/voltage variations Solution – dynamic body bias
Validation via test chip Design methodology Silicon results
Conclusions
3
The OpportunityThe Opportunity
Process Delay(ps) Power(W) P-D-P(J) Delay Power P-D-P Delay Power P-D-P bsim70 14.157 4.08E-05 5.82E-07 17.01X 308.82X 18.50X 9.93X 141.10X 14.43X
Compared traditional circuit with sub-threshold (obtained by simply setting VDD < VT)
Performed simulations for 2 different processes on a 21 stage ring oscillator.
Impressive power reductionImpressive power reduction (100X – 500X) Power-Delay-Product (P-D-P) improves by as much as 20XPower-Delay-Product (P-D-P) improves by as much as 20X
P-D-P is an important metric to compare circuit design styles
Power consumption has become a major issue for recent ICs There is a large and growing class of applications where There is a large and growing class of applications where
power reduction is paramount – not speed.power reduction is paramount – not speed. Such applications are ideal candidates for sub-threshold circuit design
4
Sub-threshold LogicSub-threshold Logic Ids has an exponential dependence on process,
voltage and temperature (PVT)
Need to stabilize the circuit performance by Need to stabilize the circuit performance by compensating for PVT variationscompensating for PVT variations
No approach to compensate sub-threshold delay Existing approaches compensate sub-threshold currents
To compensate delay, need a representative circuit Not easy to come up with representative circuit for
standard cells
t
ds
t
offTgs
v
V
nv
VVV
osubds ee
L
WII 1
5
Our SolutionOur Solution
We propose a technique that uses self-adjusting body-bias self-adjusting body-bias to to phase-lockphase-lock the circuit delay to a beat clock. the circuit delay to a beat clock.
Use a network of PLAsnetwork of PLAs to implement circuits. Several PLAs in a cluster share a common common nbulknbulk node node. A representative PLA in each cluster is chosen to phase
lock the delay of the PLAs to the beat clock If the delay is too high, a forward body bias is applied to
speed up the representative PLA. If the delay is low, body bias is brought back down to zero
to slow down the representative PLA. All other PLAs exhibit the same delay as the
representative PLA, since they all share a common nbulk terminal
6
ObjectiveObjective
Validate and verify flow by designing a sub-threshold circuit for the application
Choose a test applicationChoose a test application Low power, low speed
Develop a sub-threshold circuit design flow Implement our delay compensation scheme to negate
PVT variations Implement the same application using a standard cell
based flow on the same die Fabricate and test the chip (TSMC 0.25 um process)Fabricate and test the chip (TSMC 0.25 um process)
Compare the sub-threshold circuit with the standard cell circuit in terms of power consumption
7
Test Application - Binary Test Application - Binary Frequency Shift Keying (BFSK) Frequency Shift Keying (BFSK)
TransmitterTransmitterDAC
Amplifier
Antenna
Digital BFSK ModulatorProduces two tonesf1 if Input is LOWf2 if Input is HIGH
Binary Input Data
Digital Block Implemented Using Digital Block Implemented Using Sub-threshold CircuitsSub-threshold Circuits
SpecificationsSpecifications Input bit Rate: RB = 32kbps, Broadcast distance: D = 1000m FSK tones: f1=150kHz, f2=450kHz, Channel bandwidth: B = 300kHz
Digital part of the circuit implemented as NPLADigital part of the circuit implemented as NPLA (Network of Programmable Logic Arrays) NPLAs have low delay Critical path delay easy to find PLAs have common nbulk node
Circuit level PVT compensationCircuit level PVT compensation An external Beat Clock (BCLK) signal is phase locked with the phase locked with the
critical path delaycritical path delay Delay controlled by a charge pump that modulates the bulk charge pump that modulates the bulk
voltagevoltage of transistors in the circuit Compensates for both inter- and intra-die variationsCompensates for both inter- and intra-die variations
9
Dynamic NOR-NOR PLADynamic NOR-NOR PLA We use precharged
NOR-NOR PLAs as the structure of choice
Wordlines run horizontally
Inputs / their complements and outputs run vertically
Each PLA has a “completion” signal that switches low after all the outputs switch
Several PLAs in a cluster share a common nbulk node.
Inputs Outputs completion
clk
clk
clk PrechargeEvaluate
10
Network of PLAs (NPLA)Network of PLAs (NPLA)
L1PLA
L2PLA
L2PLA
L3PLA
L4PLA
Timing Diagram
L1 PLA
L2 PLA
L3 PLA
L4 PLA
Combinational LogicImplemented as NPLA
InputsOutputs
Throughput = Tpchg +n.Teval
clk
11
The Charge PumpThe Charge Pump- PLA “completion” signal lags beat clock- nbulk node gets forward biased
- PLA “completion” signal leads beat clock- nbulk goes back to zero bias
pullup
pulldown
12
Effectiveness of the Effectiveness of the ApproachApproach
We simulated a single PLA from 0ºC to 100ºC. Also applied VT variations (10%) and VDD variations (10%).
The light region shows the variations on delay over all the corners without delay compensation.
The red region shows The red region shows the delays with the the delays with the self-adjusting body-self-adjusting body-bias circuit.bias circuit.
13
Design FlowDesign FlowBFSK
DesignHDL Synthesis Map
to NPLALogic Verification
IntegratedSpice Netlist
Layout
LVSRC Extraction
Full ChipSpice
Verification
Spice Verification: Functional,
timing, charge pump
DesignOf Analog
Components
14
9 8DFF DFF
Sine LookupTableDepth:
29 = 512
PhaseIncrement
Clk Clk
Mux
BinaryInput
Phase Accumulator
BFSK DesignBFSK Design
fout < fclk/2, Nyquist criterion, implies < 256. Phase increments chosen based on fclk or left
programmable in real time to get Software Defined Radio (SDR) operation.
We fix phase increments to avoid extra input pins required for SDR
Reference PLAReference PLACommon Common nbulknbulknode of a cluster node of a cluster of PLAs, modulatedof PLAs, modulatedby charge pump by charge pump
Clk Clk
L1PLA
L2PLA
L2PLA
NPLA
19
HDL to Schematic of Digital HDL to Schematic of Digital BFSKBFSK
Digital BFSK transmitter described using VHDL VHDL synthesized using FPGA synthesis tool, to get a
gate level netlist This is imported into SIS in “blif” format The “blif” file is logically optimized and mapped into
NPLA Technology Independent Optimization done on circuit Circuit converted to a mult-level network of nodes with
5 or less inputs per node Circuit traversed from inputs to outputs, and nodes are
implemented using PLAs of size (8/6/12) Using NPLA throughput equation, fclk estimated as 1.2MHz
Currents flow through mirror legs based on input value
W1
Output current / voltage modulated based by sum of weighted currents through Rout
Thermometer codes prevent glitches at output
DAC supply is 0.7V to handle 0.6V digital signals
Rout, Rcm are off-chip resistances
24
Amplifier SchematicAmplifier Schematic
Common Source Amplifer
Supply of 0.7V Rd, Rs are off-chip
resistances M1 biased by DAC Rout
resistor CL on-chip antenna load
80pF
25
Testability Features added before Testability Features added before IntegrationIntegration
Charge Pump
PhaseAccum NCO
Binary toThermometer
Encoder
DFF DFF
CLK
BEAT CLK
CLK
DACAmplifierAntenna
Input
9 8
19
Phase Detector
Ref. PLA completion
Common Bulkn
CHIP
8-BIT BFSK Output or8-BIT DAC Input
Bulkn
ChargePump Supply
DAC OuputAmp Ouput
26
LayoutLayout Manual PLA layout for every PLA in design NPLA routed using SEDSM I/O pad cells, ESD diodes layout done manually DAC, amplifier layout done manually Antenna coil layout done manually
27
PLA LayoutPLA Layout
Word, Lines
Input, Bit Line
Output, Lines
Transistors, modifiedbased on logic tobe implemented
28
I/O PAD CELL LayoutI/O PAD CELL Layout
I/O PAD
Primary ESD Diodes
Secondary ESD Diodes
I/O Drivers
Fully Compliant with TSMC Design rules
ESD Diodes have guard rings to prevent latchup
Fully Compliant with TSMC Design rules
ESD Diodes have guard rings to prevent latchup
29
Die PhotoDie Photo
Digital BFSK output domain, 2V
Dig
ital
BFS
K in
puts
dom
ain,
0.7
V
Digital BFSK domain, 0.6V
Std Cell domain, 2.5V
30
Experimental Results from Experimental Results from SiliconSilicon
Output of BFSK transistor is shown As input changes from 0 to 1, the
output frequency changes showing the modulation
Output of BFSK transistor is shown As input changes from 0 to 1, the
output frequency changes showing the modulation
Fclk = 1MHz F1 = 117kHz F2 = 347kHz The adjacent peaks are around -10dB
below the fundamental peaks We found from Matlab Simulations that,
signals from the extracted Spice netlist, could be demodulated at the receiver side
31
Results from SiliconResults from Silicon
Nbulk kept at 0V, 0.45V Maximum frequency shows an quadratic dependence on supply Voltage
Operating RangeOperating RangeOperating RangeOperating Range
32
Design Style Operating Voltage
Frequency of Operation
Avg
Current
Power Dissipated
Sub-threshold 0.6V 1.05MHz 26.8W
Std Cell 2.5V 1.05MHz 208A 520W
Power ComparisonPower Comparison
Sub-threshold power calculated only for Phase Accumulator, and NCO blocks on 0.6V power supply,
Std Cell implements only this portion of BFSK circuit Sub-threshold gives 19.4X lesser power
33
Bulkn Node ModulationBulkn Node Modulation
Bulk node modulates when beat clock demands speedup or slow-down
Bulk node modulates as supply voltage is changed, so that circuit delay is maintained constant.
34
ConclusionConclusion
Validated a sub-threshold circuit design methodology based on dynamic body bias (first-of-kind) Validated design tools and techniques First-of-kind design automation flow, will help bring sub-
threshold design to mainstream.
We implemented an ultra low power, low data rate wireless BFSK transmitter
The fabricated chip, works as expected, validating our design flow.
We compared the sub-threshold design a with Std Cell based design and showed 19.4X reduction in power.
35
Thank you!!
Backup SlidesBackup Slides
36
37
IntroductionIntroduction Power consumption has become a significant
hurdle for recent ICs Higher power consumption leads to
Shorter battery life Higher on-chip temperatures – reduced operating
life of the chip There is a large and growing class of applications There is a large and growing class of applications
where power reduction is paramount – not speed.where power reduction is paramount – not speed. Such applications are ideal candidates for sub-
threshold circuit design For sub-threshold circuits, VDD ≤ VT
38
TX/RX System TestingTX/RX System Testing
TX PCB with subthreshold IC
TX antennas
RX board RX setup
39
Solving the Problem of Solving the Problem of Delay Sensitivity to Delay Sensitivity to
Process, Voltage and Process, Voltage and Temperature VariationsTemperature Variations
"A Variation-tolerant Sub-threshold Design Approach", Jayakumar, Khatri. Design Automation Conference (DAC) 2005 Anaheim, CA , June 13-17.
40
An Example Showing An Example Showing Phase LockingPhase Locking
This figure shows how the body bias (and hence the delay of the PLA) changes with changes in VDD.
The adjustment is very quick (within a few clock cycles).
VDD change0.2V to 0.22V
VDD change0.22V to 0.18V
41
Energy and SpeedEnergy and Speed We may be interested in the minimum energy operating
point for the design Minimizing VDD reduces power but minimum VDD does not
mean minimum energy The optimum VDD value increases with increased logical depth,
and with temperature "Minimum Energy Near-threshold Network of PLA based Design", Jayakumar, Khatri.
International Conference on Computer Design (ICCD) 2005, Oct 2-5, San Jose, CA.
Reclaiming the speed penalty Can be done for datapath circuits, using asynchronous
micropipelining Showed that speedup of 7X is possible, with a area overhead
of 44% "A PLA based Asynchronous Micropipelining Approach for Subthreshold Circuit
Design", Jayakumar, Garg, Gamache, Khatri. IEEE/ACM Design Automation Conference (DAC) 2006, July 24-28, San Francisco, CA.
42
On-chip AntennaOn-chip Antenna Antenna size needs to be at least a 10th of the transmit
wavelength to radiate effectively Transmit wavelength around 600m Due to on-chip space constraints, antenna coil length is
only 0.2m We have the option of using an external antenna And we had a 60dB safety margin in the link budget
analysis. This could compensate for a lossy antenna
43
Spectrum of Amplifier Spectrum of Amplifier TonesTones
Fclk = 1MHz F1 = 117kHz F2 = 347kHz The adjacent peaks are
around -10dB below the fundamental peaks
We found from Matlab Simulations that, signals from the extracted Spice netlist, could be demodulated at the receiver side