Effective Memory U in a Media Server
Edward Chang and Hector Garcia-Molina
Department of Computer Science
Stanford University
Abstract
A number of techniques have been developed for
maximizing disk utilization in media rvers,in-
cluding disk arm scheduling and data placement
ones.Instead,in this paper we focus on how to ef-牛肉豆腐脑
ficiently utilize the available memory.We prent
techniques for best memory u under different disk
policies,and derive preci formulas for computing
memory u.We show that with proper memory
u,maximizing disk utilization does not neces-
sarily lead to optimal throughput.In addition,we
study the impact of data placement policies includ-
ing disk partitioning and multiple disks.Finally,
our analysis shows that maximizing disk utilization
and disk striping incur high system costs,and are
not advisable in a media rver.
1Introduction
The storage system of a multimedia system faces more chal-
钢笔字图片
lenges than a conventional one.First,media data must be
retrieved from the storage system at a specific rate—if not,
the system exhibits“jitter”or“hiccups.”This timely data
retrieval requirement is also referred to as the continuous
requirement or the real-time constraint[17].Second,the
required data rate is very high.For example,an MPEG-1
compresd video requires an average data rate of1.5Mbps
and MPEG-24Mbps.Guaranteeing real-time supply at this
high data rate for concurrent streams is a major challenge
for multimedia storage systems.
Most multimedia storage rearch has focud on op-
手机营销
timizing disk bandwidth via scheduling policies and data
and we again prent performance formulas that precily
account for memory sharing.
The third scheme we consider is a Group Sweeping
Scheme(GSS)[18],which can be considered a hybrid be-
tween Sweep and Fixed-Stretch.We discuss how memory
can be effectively ud by this scheme(something that was
not clearly spelled out in the original paper).In Section6we
compare the schemes in a realistic ca study,highlighting
the basic disk bandwidth and memory u tradeoffs.
Section7analyizes data placement policies that are not
considered in thefirst six ctions.We consider the impact of
partitioning the disk into regions,and of using multiple disks.
Partitioning and multiple disks can impact performance,but
they do not alter the conclusions of our study.Finally,we
offer our conclusions in Section8.
2Scheme Sweep
In this ction we briefly describe a well known multimedia
delivery scheme,which we call Sweep.Scheme Sweep us
an elevator policy for disk scheduling in order to amortize
disk ek overhead.It is reprentative of a class of schemes
[6,8,11,13,15]that optimize throughput by reducing disk
ek overhead.Study[13]shows that an elevator policy is
superior for retrieving continuous media data in comparison
to a policy in which requests with the earliest deadlines are
rvicedfirst.We will u scheme Sweep as a benchmark
for comparing with other schemes.
For now,let us assume a single disk and let us make no
assumptions as to the data placement policies.(We discuss
data placement and multiple disks in Section7.)Wefirst
prent Sweep under the assumption that each request is
allocated its own private memory buffer with no memory
sharing among the requests.(Buffer sharing among requests
is discusd in Section3.)
During a sweep of the disk,Sweep reads one gment of
data for each of the requested streams.The data for a stream
is read into a memory buffer for that stream,which must be
adequate to sustain that stream until its next gment is read
during the following disk elevator sweep.To analyze the公务用车自查报告
performance of Sweep,we typically are given the following
parameters:
:the disk’s data transfer rate.
:a concave function that computes the rotational and
ek overhead given a ek distance.For convenience,
we will refer to the combined ek and rotational overhead
as the ek overhead.
:the storage system’s available memory.
:the number of stream requests.Each request is denoted
as,,...,.Each stream requires a display rate of
().(For simplicity we assume that the display
rates are equal.)The value of must be less than
divisor of the display rates as the unit display rate,and to treat each display
rate as a multiple of the unit one.For example,if the display rates are6and
4Mbps,we can treat a6Mbps request logically as3requests of2Mbps
each,and we can treat a4Mbps request as2of2Mbps.Thus,all our ba
requests are of the same rate.
request.Since is a concave function[9,14,16],the largest value of the total ek overhead for Sweep oc-curs when the gments are equally spaced on the disk,or
.Thus,the worst ca ek(and rota-tional)time is:
(2) The total transfer time for requests,each of size,at
a transfer rate,is
(3)
As we stated above,the period must be larger or equal than the worst ca ek and transfer ,
.For optimal performance,how-ever,we take the smallest feasible value,since otherwi we would be wasting both disk bandwidth and memory re-sources.That is,
(4)
The third equation that must be satisfied by scheme Sweep is obtained from our physical memory limit.Although in a period we only read bytes for each stream,it turns out we need a buffer of twice that size for each stream to cope with the variability in read times.To e this,consider a particular stream in progress,where we call the next three disk arm sweeps,,and.Assume that the gments needed by our stream are,,and.It so happens that becau of its location on disk gment is read at the beginning of sweep ,while is read at the end of,time units after is read.At the point when is read we need to have in memory data,to sustain the stream for time.
When gment is read,we will only have bytes in memory,which is only enough to sustain us for conds. Fortunately,becau was at the end of its sweep,the next gment can be at most conds away,so we are safe. Actually,could happen to be the veryfirst gment read in sweep,in which ca we would againfill up the buffer with roughly data(minus whatever data was played back in the time it takes to do both reads).
Intuitively,what happens is that half of the buffer is being ud as a cushion to handle variability of reads within a sweep.Before we actually start playing a stream we must ensure that this cushion isfilled up.In our example,if this stream were just starting up,we could not start playback when was read.We would have to wait until the end of sweep(the sweep wherefirst gment was read)before pl
ayback started.Then,no matter when and and the rest of the gments were read within their period,we could sustain the playback rate.(This startup delay will be important when we analyze initial latencies in Section6.3.) Adding a cushion buffer for each request doubles the memory required.So,to support requests,we must have.For optimal performance, however,we should u all available memory.(By using all available memory we make gments larger.This lets us increa(Eq.1),which then lets us increa in Equation4,since.)Thus,we have that for optimal feasible performance,
(5)
In summary,scheme Sweep has three tunable parame-ters,and we have derived three equations they must satisfy (Equations1,4,5)for optimal performance.From the equations we can solve for,,and.
2.2Minimizing Memory
We can derive a clod form for the minimum memory re-quirement as follows.We assume that is given and that is unknown,and we solve for it.Substituting (Eq.1)into Equation4,we can solve for,the
gment size needed to support the requests:
(7)
It is important to note in Equation7that does not grow linearly with the desired.First,the numerator grows quadratically with.Second,as grows the denominator of Equation7approaches zero,causing to grow without bound.As the denominator gets clo to zero,we are driving the system to its physical limits: streams at a rate require bytes per cond,and the disk can only read at most per cond. When approaches,the system needs a huge amount of memory to support an additional request.As we will discuss in Section6.4,even if the system can support concurrent streams,doing so is not cost effective.
In addition to deriving the minimum memory require-ment,we can also derive the maximum system throughout for a given amount of memory().Plea e[4] for details.
3Reducing Required Memory
In scheme Sweep each request is allocated afixed private buffer of size.One way to reduce the memory require-ments is to have requests share their buffer space[10,16]. That is,we create a shared memory pool,and as the space ud by one request frees up,it can be ud to hold data from other requests.V arious papers have estimated that sharing can cut the memory requirements by“rou
ghly half.”However,the estimates are obtained with very strong as-sumptions,in particular that all ek times must be zero.In this ction we revisit how exactly memory sharing works (without strong assumptions),and in doing so prove under what conditions maximal sharing can be obtained,and what the savings actually are.
Figure1depicts the amount of memory ud by a request in a period.An IO starts shortly before the data staged into memory in the previous period is ud up.The data
T i m e M e m o r y S i z e
T
- DR
TR - DR Figure 1:Memory Required in A Period
M e m o r y S i z e
Figure 2:Memory Requirement Function
accumulates in memory at the rate of until the IO
completes.
For our analysis we make two simplifying assumptions.First,we assume that memory can be freed in a continuous fashion.In other words,Figure 1shows the actual memory ud by a request.In practice,of cour,memory is relead in pages,so Figure 1would have a quence of small de-creasing steps,each one page in size.This implies that our estimates for memory u may be up to one memory page off for each request.Thus,our continuous relea assumption is an optimistic one for buffer sharing schemes.However,if as expected the page size is small compared to the gment size,the difference will be negligible.
Our cond assumption is to approximate the memory u function by a right triangle.Our assumption caus us to overestimate memory u:we will assume that the peak in Figure 1is (at time 0in the figure),while in reality it is .This is a pessimistic assumption,but since typically the data transfer rate is much larger than the display rate ,the difference is very small.
Notice that the small differences caud by our two as-sumptions tend to cancel out each other.In particular,if the page size is ,the effects will cancel.If the page size is less than this value,as is probably the ca,then overall our results will be slightly pessimistic for memory sharing.
3.1Optimal Delays
Before discussing memory sharing under Sweep,it is in-structive to analyze an ideal ca where IOs for a given stream occur in a regular fashion,as shown in Figure 2.In
.(Keep in mind that this does not include
文明上网
cushion buffer requirements.)
We provide in [2,4]the proofs of Theorem 1and Corol-lary.The results suggest that even if one cannot perfectly parate in time the IO quences,it is desirable to space out requests as much as possible.This is precily what we do in order to optimize the memory u of Sweep.3.2Scheme Sweep
We refer to scheme Sweep with memory sharing as Sweep .With a sweeping scheme we cannot control when IOs occur within a period,since they are done in the order found as the head sweeps th
e disk.In the worst ca,all the IOs in a period will be bunched together (if all the gments needed in a period happen to be nearby on the disk.)This means that the memory peaks are summed,leading to poor memory sharing.However,the IOs need to be parated by at least the time it takes to read a gment,so the peaks are not fully added.In addition,the last IO of a period can be delayed and parated from a cluster of IOs,further improving memory sharing.Finally,it is also possible to share the cushion buffers ud by each stream to account for IO variability,leading to even better memory utilization.All the effects are carefully analyzed in [4],where we also show the following result.Incidentally,note that various papers had earlier “guesd”how much memory a scheme like Sweep us,but the estimates were not accurate.
Theorem 2:The minimum memory space required to sup-port streams under scheme Sweep is
For the theorem that deals with streams with different display rates plea e [2].
Figure3:Service Slots of Fixed-Stretch
4Scheme Fixed-Stretch
In order to reduce the variability between the IOs of a re-quest,in this ction we consider a scheme where IOs are
performed in afixed order from period to period.We call
this scheme Fixed-Stretch becau in addition the IOs are spaced out as described next.For completeness,a version
of this scheme that does not u memory sharing,Fixed-
Stretch,is discusd at the end of this ction.
To eliminate the need for cushion buffers entirely and
maximize memory sharing(Theorem1),we must parate
the IOs of a request by a constant time.However,since the data on disk for the requests are not nece
ssarily parated糖果用英语怎么说
by equal distance,we must add time delays between IOs to space them equally in time.
For instance,if the ek distances in a disk sweep are,
,...,and cylinders,and is the maximum of the,then we must parate each IO by at least the time it takes to ek to and transfer this maximum th request.One
can choo a different parator for each period,depending
on the maximum ek distance for the requests of that pe-riod.However,as we have argued earlier,there is no benefit
allowing to vary from cycle to cycle.To have a constant and simplify the algorithms,scheme Fixed-Stretch us the worst possible ek distance()and rotational delay,
together with a gment transfer time,as the universal IO
parator,,between any two IOs.The length of a period, ,will be times.
Figure3shows an example with three requests that thusly parated.The time on the horizontal axis is divided into rvice cycles each lasting units.Each rvice cycle(the shaded area)is equally divided into three rvice slots,each lasting units(delimited by two up-arrows).The vertical axis in Figure3reprents the amount of memory utilized by an individual stream.
Fixed-Stretch executes in the following steps:
1.At the start of a rvice slot(indicated by the up-arrow
in Figure3),t the end of slot timer to expire in.
2.If there is no request to be rviced in the rvice slot, skip to Step6.
3.Allocate amount of memory for the request rviced in this time slot.veral ways to handle the IOs.One idea is to map the physical pages to a contiguous virtual address,and then initiate the transfer to the virtual space(if the disk supports this).Another idea is to break up the gment IO into multiple IOs,each the size of a physical page.The transfers are then chained together and handed to an IO processor or intelligent DMA unit that executes the entire quence of transfers with the same performance as a larger IO.Other ideas are discusd in[10].
The accuracy of the timers ud by Fixed-Stretch can be tuned peri-odically by cross-checking the amount of data in the stream buffers.
The total ek overhead and delay for requests is
times this value,so.
Thus can be written as
memory.In addition,since the relea of the
memory allocated by each request at the beginning of a r-
vice slot does not start until at the playback point of the slot
(e Figure3and execution step5),each request needs to
retain the data for extra time.This delay of data
consumption(and hence memory deallocation)requires that
each request has an additional amount of buffer
space.(This extra space requirement is very small;for the
parameters values of Section6,the extra buffer requirement
is about of the gment size.)The required mem-
ory must be smaller than or equal to the available memory
.For optimal performance,we try to give each
request as much memory as possible to maximize.
Thus,we have the memory resource constraint
(10)
where(11)
The following theorem gives the memory requirements
for GSS,taking into account sharing of all memory(in-
王虹虹
cluding any cushions needed to cope with IO variability).
Theorem3:The minimum space required to support
streams under scheme GSS is
(12)
Plea refer to[4]for the proof.
6Evaluation
To evaluate and compare the performance about various
schemes discusd in this paper,we u the Seagate Bar-
racuda4LP disk[1];its parameters are listed in Table4.
We also assume a display rate of1.5Mbps,which is
sufficient to sustain typical video playback.For the ek
overhead we follow cloly the model developed in[9,14]
that is proven to be asymptotically clo to the real disks.The
ek overhead function is a concave function as following:
>副科长