CompatPM:Enabling Energy Efficient Multimedia Workloads for Distributed Mobile Platforms
Ripal Nathuji,Keith J.O’Hara,Karsten Schwan,Tucker Balch
College of Computing
Georgia Institute of Technology
我的姐姐作文Atlanta,GA30032
{rnathuji,kjohara,schwan,tucker}@cc.gatech.edu
ABSTRACT
The computation and communication abilities of modern platforms are enabling increasingly capable cooperative distributed mobile systems.An example is distributed multimedia processing of nsor data in robots deployed for arch and rescue,where a system manager can exploit the application’s cooperative nature to optimize the distribution of roles and tasks in order to successfully accomplish the mission.Becau of limited battery capaci-ties,a critical task a manager must perform is online energy management.While support for power management has become common for the component
s that populate mobile platforms,what is lacking is integration and ex-plicit coordination across the different management actions performed in a variety of system layers.This papers develops an integration approach for distributed multimedia applications,where a global manager specifies both a power operating point and a workload for a node to execute.Surprisingly,when jointly considering power and QoS,experimental evaluations show that using a simple deadline-driven approach to assigning frequencies can be non-optimal.The trends are further affected by certain characteristics of underlying power management mechanisms,which in our rearch,are identified as groupings that classify component power management as “compatible”(VFC)or“incompatible”(VFI)with voltage and frequency scaling.We build on thefindings to develop CompatPM,a vertically integrated control strategy for power management in distributed mobile systems.Experimental evaluations of CompatPM indicate average energy improvements of8%when platform resources are managed jointly rather than independently,demonstrating that previous attempts to maximize battery life by simply minimizing frequency are inappropriate from a platform-level perspective.
Keywords:Power management,distributed mobile systems
1.INTRODUCTION
Technological advancements in the areas of computational and communication resources have led to the deploy-ment of increasingly sophisticated mobile distributed systems.Since many of the applications running across the systems are cooperative in nature,there is considerableflexibility in distributing the associated computa-tions to improve performance,gain fault tolerance,or,the focus of this work,reduce energy consumption.As a concrete example,cooperative multimedia applications can be found in autonomous mobile robots performing critical arch and rescue missions.In this scenario,robots u nsors as input for navigation algorithms and situational awareness.The resulting multimedia dataflows are coordinated by a system manager controlling the deployment and configuration of the associated software components.
For modern robotic systems,the energy ud by distributed multimedia tasks can substantially contribute to total power consumption.Since mobile robots have a limited energy supply,efficient energy management translates into prolonged lifetimes and more successful missions.The computing platforms ud in the systems enable energy efficiency with integrated components that provide multiple power management states.Power management of the resources is typically performed at multiple layers.Indeed,for certain components,control naturally belongs in one layer over another.For example,in order to obtain memory power savings during active periods,the time gr
anularity of management is muchfiner than the operating system can attain.Therefore, the management of this resource must be performed in the platform itlf via integrated hardware solutions. Utilizing dynamic voltage and frequency scaling(DVFS)is interesting as its control can feasibly be assigned to multiple layers.Since frequency scaling of processors directly affects the end performance of applications,in
a distributed environment where global time constraints are not known at each node,the control of processor operating points naturally belongs in the system management component of the architecture.
Our previous work on energy efficient software systems1has investigated the deployment of application tasks across participants in a distributed system,including considering the tradeoffs between energy overheads of computation and communication when offloading work.This paper considers the additional gains possible when allowing the system deployment manager to specify a processor operating point at which to execute a multimedia workload deployed onto a robot.We begin by analyzing the energy tradeoffs of frequency scaling without the u of other power management mechanisms andfind that:(1)energy savings attained with DVFS on the processor need not directly translate to platform-level savings,and therefore,(2)even when the lowest operating point
can be ud,from a platform-level perspective,it may actually consume more energy than a higher frequency. To effectively consider how the trends are affected by underlying power management mechanisms,we propo the notion of power management compatibility.In particular,there are tho power management schemes who achievable savings are not affected by frequency scaling and can be called“compatible”with DVFS(VFC),and others who effects vary significantly depending on processor modes and are“incompatible”(VFI).
Prior work has already shown it uful to have power management integrate across different OS subsystems for online power management,including rescheduling process to improve power consumption of peripheral devices.2,3Therefore,in this rearch we adopt a vertically integrated solution for power management in co-operative distributed systems and propo the CompatPM architecture.The approach requires attributes of platforms,workloads,and underlying power management schemes to be exported so that the CompatPM en-abled system manager can properly assign performance points given the QoS constraints of the distributed system and applications.In our evaluation wefind average improvements of8%in energy savings compared to a simple heuristic that lects the lowest operating point which meets latency constraints.
2.RELATED WORK
怎么才能挣钱
In a study of the Pioneer DX-3mobile robot,motion was12.1%–44.6%of the energy ud depending on speed,and the embedded computer accounted for33.3%–65.3%of the energy.4Since energy for computation will increa as robots become more autonomous and u more powerful computer platforms such as multicore systems,the importance of power management for sustaining missions of multi-robot teams will increa in future platforms. Towards reducing the energy consumption of mobile devices,Zeng et al.5make power afirst class resource in order to provide system lifetime guarantees,and describe a scheduler6that allocates power to process, considering various scheduling algorithms.Another energy accounting approach7utilizes hardware performance counters to estimate energy usage and schedules accordingly.
Since the dynamic power consumption of a CPU is proportional to the product of frequency and voltage squared,dynamic voltage and frequency scaling can be effective in reducing power consumption during program execution.8A design framework for exploring power/performance trade-offs when developing hard real-time systems has been propod,9as well as an offline scheduler coupled with online slack reclaiming for enhancing the benefits of DVFS.10Our own past rearch exploits the ability to obtain energy savings by performing application-level adaptations for multimedia applications.11Other rearch aggressively pursues reduced frequencies within the cons
traints of application deadlines using memory access information.12In all of the DVFS approaches, the effect of frequency scaling on the CPU energy signature is considered independently of the rest of the system. Fan et al.13begin to move away from this assumption by investigating the synergy between DVFS and power-aware memory systems.Miyoshi et al.14question the underlying assumption of DVFS,that frequency should be reduced whenever possible,by isolating poor performance points created by efficient idle modes.Similarly, the optimality of lower frequency points when considering system sleep modes has also been studied,15as well as the tradeoffof overheads such as processor leakage power and extended resource standby times.16,17As part of this rearch,we evaluate similar tradeoffs in the context of multimedia workloads.
3.ENERGY EFFICIENT DEPLOYMENT OF MULTIMEDIA TASKS
3.1.Multimedia Workloads in Cooperative Distributed Systems
An underlying assumption of this rearch is that it is possible to exploit the cooperative nature of distributed multimedia applications,subject to some t of joint operating constraints.An example in mobile robotics is a
arch and rescue scenario in which robots have to prerve the energy of all team members in order to sustain a longer application lifetime.The multimedia workloads of the robotic systems subject to runtime management may be divided into two groups:(1)tho responsible for providing situational awareness for human operators and(2)tho in which robots consume and process media data.Workloads in(1)are similar to traditional multimedia workloads,since a human consumes the data.For example,the robot may u a microphone along with a camera to relay information from a victim to a human operator.For this task,the robot would have to perform encoding and transmission of the input data.The audio and image encoding and decoding algorithms we u in our evaluation are reprentative of such situational awareness workloads.Workloads in(2)u some combination of nsors such as cameras,GPS,sonar,and odometers for robot navigation.The nsor data flows must be procesd by perceptual algorithms,either locally or on another robot in a cooperative team,to be uful for navigation.As an example,in our previous work1we considered a robot to robot dataflow for blob finding analysis of images,where energy was reduced by offloading image processing.Such offloading relies on image encoding and decoding to minimize communication energy consumption.The JPEG workloads ud in our experiments are typical of multimedia dataflows for distributed image processing.
We capture the distributed nature of our multimedia workloads by reprenting them as“task chains”of computation and communication.Tasks are units of ,JPEG encoding,edge detection)as well as units of allocation.That is,each task may be allocated anywhere in the mobile distributed system, and adjacent tasks are connected via communication operations across a wireless network when necessary.Our previous work considered the allocation of an application task chain among multiple mobile robots so as to maximize system lifetime.In this paper,we consider increasing the power efficiency at nodes after allocation by exploiting local power saving mechanisms.
3.2.Power Management Architecture
Figure1.System Power Management Infrastructure
Our power management architecture assumes that task chains are known to,and manipulated by,system managers as shown in Figure1.Such managers are uful becau they can exploit their global knowledge to perform better allocations and lect appropriate operating points of tasks on participating platforms.System managers are also necessary becau allocation and lection decisions are not just bad on task workloads, but also on end-to-end application-level constraints
such as meeting some maximum delay for gaining percep-tual insight from a t of raw nsor data readings.A common approach in past work has been to compute the minimum frequencies needed along the chain to meet end-to-end delay requirements,thereby exploiting the slack available in distributed multimedia or real-time applications.Unfortunately,as our results will show,this can result in suboptimal system-level energy consumption.In addition,such an approach does not capture the interactions between the execution speed chon and the savings achieved by the underlying power management mechanisms integrated into the mobile platform.CompatPM utilizes vertical integration to exploit the rela-tionships.Towards this end,we identify various attributes that must be provided to the software management layer so that it may make effective decisions in a scalable manner.This approach is summarized in Figure1. 3.3.Evaluation Infrastructure
美好前程Experimental results are bad on measurements obtained from reprentative embedded hardware,using Intel’s Sitsang-400evaluation platform designed around the PXA255processor.The PXA255supports multiple operat-
理想的近义词
ing points that vary CPU frequency as well as the frequency to the internal PXA bus,thereby affecting latency to memory and I/O devices.The points,along with the associated core voltages,are (core/bus@voltage):400MHz/200MHz@1.3V,400MHz/100MHz@1.3V,300MHz/100MHz
@1.1V,300MHz/50MHz@1.1V,200MHz/100MHz@1.0V,200MHz/50MHz@1.0V,150MHz/50MHz@1.0V,and 100MHz/50MHz@1.0V.Our experimental workloads consist of benchmarks from the Mediabench suite that exemplify the two types of multimedia work-loads for robotic systems discusd in Section 3.1including both the encoders and decoders for adpcm ,g721,gsm ,and jpeg .Benchmark execution time and power consumption are monitored for every available operating point.All power measurements are performed using a Tektronix TDS5104B oscilloscope,Tektronix TCP202current probes,and Tektronix P6139A voltage probes.
4.ENERGY ANALYSIS OF MULTIMEDIA APPLICATIONS Table 1.Sitsang Power Consumption Overview
Average Power Consumption (mW)per Operating Point (Core/Bus Frequency)Scenario
辞金蹈海400/200400/100300/100300/50200/100200/50150/50100/50System Idle
13391281125512231229121312131213System Active
23512230198518581798171416661587CPU Idle
12690634444333332CPU Active 44940023520614012610682
When managing the power consumption of computational platform components,the u of frequency reduc-tion must be balanced with the resulting performance degradation.In periodic real-time systems,this tradeoffcan be formulated in a preci manner:if the execution time of a task at frequency f i is less than or equal to the period (deadline)T ,then f i is a plausible frequency for execution.For the remainder of this ction,we assume that all of the operating modes supported by the PXA255are plausible for the execution of the assigned task.In particular,we assume the deadline/periodicity for the multimedia component of a deployed application chain to be the execution time at the lowest frequency.An overview of the Sitsang platform power characteristics when active and idle at various operating points is provided in Table 1.As expected,the power consumption of both the CPU and system decrea with reduced operating points.In the entirety of this paper,energy results for the CPU and system are normalized to the execution energies measured at the 100MHz/50MHz operating point.Frequency Scaling and Energy Tradeoffs.Due to the periodic nature of multimedia workloads,the metric for energy consumption can be formalized using the notion of cycle energy.The cycle energy ,E cycle ,consists of the sum of the execution energy E exec and the idle time energy E idle for a given task.We assume that the deadline of each task is its execution time at 100MHz in order to calculate cycle energy values.
N o r m a l i z e d E n e r g y C o n s u m p t i o n (%)
400/200400/100300/100300/50200/100200/50150/50100/50adpcmdecode g721decode gsmdecode jpegdecode adpcmencode g721encode gsmencode jpegencode N o r m a l i z e d E n e r g y C o n s u m p t i o n (%)400/200400/100300/100300/50200/100200/50150/50100/50(b)System Energy Results
Figure 2.‘Cycle Energy’Consumption
Figure 2provides the CPU and system-level results for cycle energy.We obrve a clear trend in CPU cycle energy,where consumption generally decreas with lower performance points.The results support the intuition ud in various real-time and multimedia schedulers that attempt to minimize frequency whenever it is possible to do so without violating performance constraints.Indeed,we obrve CPU energy benefits of up to 60%when executing at the lowest frequency as oppod to the highest.Unfortunately,in practice,this intuition
is only partially correct.Figure 2(b)provides system-level ‘cycle energy’data.An important result in this figure is that the optimal operating point is never the lowest frequency.Indeed,the optimal frequency provides benefits of up to 8%compared to the lowest frequency.This result contradicts the assumptions made in prior work that reducing processor frequency is always beneficial to system energy consumption.We also obrve interesting trends with bus frequencies.The system-level cycle energies of the adpcmdecode and jpegdecode benchmarks are reduced when executing at 300/100versus 300/50,while the rever is true for g721decode .This illustrates that the relative benefits of operating points are workload dependent for actual systems and applications.
N o r m a l i z e d E n e r g y C o n s u m p t i o n (%)
400/200400/100300/100300/50200/100200/50150/50100/50N o r m a l i z e d E n e r g y C o n s u m p t i o n (%)400/200400/100300/100300/50200/100200/50150/50100/50(b)System Energy Result
s
Figure 3.Execution Energy Consumption
美元图案There is increasing support towards minimizing the power consumption of components when the system is not actively executing applications.We consider this trend by obrving execution energies only.In Figure 3(a),the CPU execution energy generally decreas with core frequency (as expected),but this holds only until the 150MHz operating point due to the fact that while performance continues to degrade in proportion to frequency,the power benefits decrea as shown in Table 1.The reason is that the core voltage is the same for the lower frequencies,thereby removing any voltage scaling benefits ∗.With regard to system execution energy trends,it can be en that execution energy is minimized at the maximum core frequency (the particular bus frequency depending on the benchmark).The results support an approach that executes programs quickly in order to optimize idle periods.Since our idealistic assumption of zero idle energy does not hold for current platforms,the optimal operating point will fall somewhere in between the minimum and the maximum.We therefore identify CompatPM attributes that allow for the dynamic asssment of tradeoffs between the operating points.
adpcmdecode g721decode gsmdecode jpegdecode adpcmencode g721encode gsmencode jpegencode N o r m a l i z e d E n e r g y C o n s u m p t i o n (%)
400/200400/100300/100300/50200/100200/50150/50100/50(a)DVFS with CPU Sleep Management N o r m a l i z e d E n e r g y C o n s u m p t i o n (%)400/200400/100300/100300/50200/100200/50150/50100/50Figure 4.DVFS with Interaction of DVFS and Incompatible Schemes.due to a reduction in the slack time t idle .As an example,we our measurements.For the PXA255,the sleep mode consumes of power.We utilize this value to project the cycle energy state with DVFS.Figure 4(a)provides the normalized CPU sleep power management.Comparing it to Figure increa quite dramatically in relation to lower frequencies.in conjunction with VFI power management mechanisms,∗The PXA255electrical specifications prescribe the青椒炒猪肝
energy trends of a platform.This is key background to our argument for vertical integration:our management layer should be made aware of the types of underlying power managers.
Interaction of DVFS and Compatible Schemes.While certain types of power management mechanisms do not coexist well with DVFS,others work synergistically with processor management.The VFC power manage-ment mechanisms can reduce the energy expended during execution periods by performing active management.Moreover,a VFC mechanism not only reduces energy during execution time,but can also be ud during idle periods.Indeed,VFC power managem形容女子美的诗句
ent mechanisms simply shift savings from idle to execution periods when frequency is scaled down creating independence from DVFS.Previous work has shown that memory power man-agement can be utilized simultaneously with frequency scaling.13Therefore,we u this power management scheme as our VFC example.Using XScale performance counters,we obtain execution traces of memory usage with our workloads at each operating point.The traces are then ud to estimate the power savings that can be incurred with the two Samsung K4S561632D †parts incorporated into the Sitsang platform using timing values similar to tho assumed in previous studies (100ns cache miss penalty and 10ns transition time from low power to active mode).13The resulting power signatures are prented in Figure 4(b).We obrve from the figure that memory power management can provide significant power savings on the system across all frequencies,compared to the default platform.Indeed,the benefits achieved are up to 16%in system energy when compared to executing at 100MHz with no memory management.The general trends between operating points,however,are similar to Figure 2(b).This result illustrates the relative independence of VFC power mechanisms and shows that this class of power managers can exist transparently to frequency assignment decisions.
N o r m a l i z e d E n e r g y C o n s u m p t i o n (%)400/200400/100300/100300/50200/100200/50150/50100/50Combining VFI and VFC power management examples are incorporated with DVFS.In the VFI results in Fig-ure 4(a),and the optimal us that the tradeofftrends between operating points when continue to be influenced by VFI power ma
nagement the independence of VFC managers remains The previous ction prented when using different operating points for a processor,power management support with other underlying of tradeoffs and interactions into account in order to assign that utilizing a profile-bad approach where the manager is scale.Therefore,we begin by defining a limited t of as per Figure 1,and then determine the effectiveness of decisions bad upon the inputs.
5.1.CompatPM Attributes
In terms of platform hardware,the system manager must be made aware of the inherent power tradeoffs of the system independent of any other power management.In our evaluation,we u a simple approach wherein average active/idle power values per operating mode for the platform are specified in the CompatPM attributes.†We define 165mW idle power and 40mW sleep power for the chips bad upon datasheet values.