A Replica Technique for Wordline and Sen Control in Low-Power SRAM’s Bharadwaj S.Amrutur and Mark A.Horowitz,Senior Member,IEEE
Abstract—With the migration toward low supply voltages in
low-power SRAM designs,threshold and supply voltagefluc-
一对一陪练tuations will begin to have larger impacts on the speed and power specifications of SRAM’s.We prent techniques bad on
replica circuits which minimize the effect of operating conditions’
variability on the speed and power.Replica memory cells and
bitlines are ud to create a reference signal who delay tracks
that of the bitlines.This signal is ud to generate the n clock
with minimal slack time and control wordline pulwidths to limit
bitline swings.We implemented the circuits for two variants of
the technique,one using bitline capacitance ratioing in a1.2- m
8-kbyte SRAM,and the other using cell current ratioing in a0.35- m2-kbyte SRAM.Both the RAM’s were measured to operate over a wide range of supply voltages,with the latter dissipating
3.6mW at150MHz at1V and5.2 W at980kHz at0.4V. Index Terms—Low power,low swing bus,low voltage,puld
decoder,replica technique,lf-timing,n clock control,
SRAM’s,threshold variation,wordline pulsing.
I.I NTRODUCTION
L OW-POWER circuit designers have been continually pushing down supply voltages to minimize the energy consumption of chips for portable applications[1]–[3].The same trend has also applied to low-power SRAM’s in the past few years[4]–[6].While the supply voltages are scaling down at a rapid rate,to control subthreshold leakage,the threshold voltages have not scaled down as fast,which has resulted in a corresponding reduction of the gate overdrive for the transistors.With thefluctuations in the threshold voltages also not expected to decrea in future submicron devices[7], [8],the delay va
riability of low-power circuits across process corners will increa in the future[9].The large delay spreads across process corners will necessitate bigger margins in the design of the bitline path in an SRAM,and also will result in larger bitline power dissipation and loss of speed.This problem can be mitigated by using a lf-timed approach to designing the bitline path,bad on delay generators which track the bitline delays across operating conditions.
Traditionally,the bitline swings during a read access have been limited by using active loads of either diode-connected nMOS or resistive pMOS[10],[11],but the clamp the bitline swing at the expen of a steady bitline current.A more power-efficient way of limiting the bitline swings is to u high-
Manuscript received August30,1997;revid March4,1998.This work was supported by the Advanced Rearch Projects Agency under Contract J-FBI-92-194and by Fujitsu Ltd.
The authors are with the Center for Integrated Systems,Stanford University, Stanford,CA94305-4070USA(e-mail:amrutur@chroma.stanford.edu). Publisher Item Identifier S0018-9200(98)05522-X.impedance bitline loads and pul the wordlines[12]–[15]. Bitline power can be further minimized by controlling the wordline pulwidth to be just wide enough to guarantee the minimum bitline swing development.This type of bitline swing control can be achieved by a preci pul generator that can
match the bitline delay.Low-power SRAM’s also u clocked n amplifiers to limit the n power.The are either the current mirror type[16],[17]or cross-coupled latch type[18],[19]designs.In the former,the n clock turns on the amplifier sometime before the nsing,to t up the amplifier in the high-gain region.To reduce power,the amount of time the amplifier is ON should be minimized.In the latch-type amplifiers,the n clock starts the amplification, and hence the n clock needs to track the bitline delay to ensure correct and fast operation.
Fundamentally,the clock path needs to match the data path to ensure fast and low-power operation.The data path starts from the local block lect and/or global wordline,and goes through the wordline driver,memory cell,and bitline to the input of the n amps.The clock path often starts from the local block lect or some clock pha,and goes through a buffer chain to generate the n clock.The delay variations in the former are dominated by the bitline delay since the memory cells are made out of minimum sized devices and are more vulnerable to process variations.Therefore,the delays of the two paths do not track each other very well over all process and environment conditions.Enough delay margin has to be provided to the n clock path for worst ca conditions,which reduces the average ca performance.The rest of this paper describes methods of using replica circuits, which mimic the delay of the bitline path over all conditions to creat
e the clocks,and gives experimental results from using the techniques.The next ction prents simulation data comparing the matching of bitline delay with inverter chain delay and replica circuit delay under different operating con-ditions.The following two ctions describe different methods of building replica circuits.Section III prents a clock circuit which us a dummy memory cell that drives bitlines with reduced capacitance,and Section IV describes a circuit which us a full bitline load.Results from two prototype chips which implement the two different replica techniques are prented in Section V.
II.C LOCK M ATCHING
batterychargerThe prevalent technique to generate the timing signals within the array core esntially us an inverter chain.This can take one of two forms—thefirst kind relies on a clock
0018–9200/98$10.00©1998IEEE
(a)
(b)
Fig.1.Common n clock generation techniques.
pha to do the timing[Fig.1(a)][20],and the cond kind us a delay chain within the accesd block,and is triggered by the block lect signal[Fig.1(b)]or a local wordline[21]. The main problem in the approaches is that the inverter delay does not track the delay of the memory cell over all process and environment conditions.The tracking issue becomes more vere for low-power SRAM’s operating at low voltages due to enhanced impact of threshold and supply voltagefluctuations on delays as described
by
reprents the nMOS type
(
reprents the pMOS
type(one
of),
and for
115for25
and
m CMOS process,and simulations
are done for a bitline spanning64rows.We can obrve that
the bitline delay to inverter delay ratio can vary by a factor
of two over the conditions,the primary reason being that,
while the memory cell delay is mainly affected by the nMOS
thresholds,the inverter chain delay is affected by both nMOS
and pMOS thresholds.The worst ca matching for the inverter
delay chain occurs for process corners where the nMOS and
pMOS thresholds move in the opposite direction.In the above
simulations,it is assumed that they move independently,while
英文推荐信
in reality,there will be some correlation between them which
would make the mismatch for the inverter delay chain less
pronounced,but still wor than that of the replica element.
The delay element is designed to match the delay of a
nominal memory cell in a block.But in an actual block of
cells,there will be variations in the cell currents across the
cells in the block.Fig.3displays the ratio of delays for the
bitline and the delay elements for varying amounts of threshold
mismatch in the access device of the memory cell compared
to the nominal cell.The graph is shown only for the ca
of the accesd cell being weaker than the nominal cell as
this would result in a lower bitline swing.The curves for the
inverter chain delay element(hatched)and the replica delay
element(solid)are shown with error bars for the worst ca
fluctuations across process corners.The variation of the delay
ratio across process corners in the ca of the inverter chain
delay element is large even with zero offt in the accesd
cell,and grows further as the offts increa.In the ca of the
replica delay element,the variation across the process corners
is negligible at zero offts,and starts growing with increasing
offts in the accesd cell.This is mainly due to the adver
impact of the higher nMOS thresholds in the accesd cell
under slow nMOS conditions.It can be noted that the tracking
of the replica delay element is better than that of the inverter
chain delay element across process corners,even with offts
深圳东方英文书院in the accesd memory cell.
There are two more sources of variations that are not
included in the graphs above and make the inverter matching
eeepc
even wor.The minimum sized transistors ud in memory
cells are more vulnerable to delta
–
Fig.2.Delay matching between the bitline delay to generate120mV and two delay elements,one bad on an inverter chain and the other on a replica cell–bitline combination.
III.F EEDBACK B ASED ON C APACITANCE R ATIOING
The replica delay stage is made up of a memory cell
connected to a dummy bitline who capacitance is t to
be a fraction of the main bitline capacitance.The value of
the fraction is determined by the required bitline swing for
proper nsing.For the clocked voltage n amplifiers weone of a kind音译
u(Fig.4),the minimum bitline swing for correct nsing is
around a tenth of the supply.An extra column in each memory
block is converted into the dummy column by cutting its bitline
pair to obtain a gment who capacitance is the desired
咖啡豆怎么吃
fraction of the main bitline(Fig.5).The replica bitline has a
similar structure to the main bitlines in terms of the wire and
diode parasitic capacitances.Hence,its capacitance ratio to the
main bitlines is t purely by the ratio of the geometric
lengths
).The output of the replica delay cell is fed to a buffer chain to start the local nsing,
adult picture
and is also fed back to the block decoder to ret the block
lect signal.Since the block lect pul is ANDed with the
global wordline signal to generate the local wordline pul,
the latter’s pulwidth is t by the width of block lect
signal.It is assumed that the block lect signal does not
arrive earlier than the global wordline.The delay of the buffer
chain to drive the n clock is compensated by activating
the replica delay cell with the unbuffered block lect signal.
Fig.3.Matching of the bitline delay with the inverter chain delay and the
replica cell–bitline delay across processfluctuations over varying threshold
offts for the accesd memory cell.
Fig.4.Latch-type n amplifier.
The delay of thefive inverters in the buffer chain,
chain has
sfd
three rising delays and two falling delays,while the
(e Fig.6)to minimum bitline swing is
Fig.7.Delay matching of two buffer chains.
TABLE I
B LOCK P ARAMETERS:256R OWS,64
C OLUMNS
and the delay to the nclock is
equals and
Fig.8.Current-ratio-bad replica structure.
by merely trimming the delay of
the
chain to ensure that the n clock turns on a fixed number of gate delays before the bitlines differentiate.
IV.F EEDBACK B ASED ON C ELL -C URRENT R ATIOING While the above technique works well,it can be modified further to improve the access time.If the ret timing signal for the wordline can be generated locally,then the wordline driver can be skewed to speed up the propagation of the rising block lect transition,reducing the access time,with the falling wordline transition being triggered off the local ret signal,similar to the postcharge gates discusd in [24].
lookafterAn extra row and column containing replica memory cells can be ud to provide local retting timing information for the wordline drivers.The extra row contains memory cells who pMOS devices are eliminated to act as current sources,with currents equal to that of an accesd memory cell (Fig.8).All of their outputs are tied together,and they simultaneously discharge the replica bitline.This enables a multiple of memory cell current to discharge the replica bitline.The current sources are activated by the replica wordline,which is turned on during each access of the block.The replica bitline is identical in structure to the main bitlines,with dummy memory cells providing the same amount of drain parasitic loading as the regular cells.By
connecting
times that of the main bitline slew rate,achieving the
same effect as bitline capacitance ratioing described
earlier.
Fig.9.Skewed wordline driver.
The local wordline drivers are skewed to speed up the rising transition,and they are ret by the replica bitline as shown in Fig.9.The replica bitline signal is forwarded into the wordline driver through the dummy cell access
transistor