首页 > 美文阅读

On a Catalogue of Metrics for Evaluating Commercial Cloud Services (2012)

更新时间:2023-05-22 08:09:01 阅读：评论：0

On a Catalogue of Metrics for Evaluating Commercial Cloud Services

Zheng Li

School of CS

NICTA and ANU Canberra, Australia Zheng.au

Liam O’Brien

CSIRO eRearch

CSIRO and ANU

Canberra, Australia

Liam.OBrien@csiro.au

He Zhang

School of CSE

NICTA and UNSW

Sydney, Australia

He.au

Rainbow Cai

School of CS

NICTA and ANU

Canberra, Australia

Rainbow.au

Abstract— Given the continually increasing amount of commercial Cloud rvices in the mark et, evaluation of different rvices plays a significant role in cost-benefit analysis or decision making for choosing Cloud Computing. In particular, employing suitable metrics is esntial in evaluation implementations. However, to the best of our knowledge, there is not any systematic discussion abo

ut metrics for evaluating Cloud rvices. By using the method of Systematic Literature Review (SLR), we have collected the de facto metrics adopted in the existing Cloud rvices evaluation work. The collected metrics were arranged following different Cloud rvice features to be evaluated, which esntially constructed an evaluation metrics catalogue, as shown in this paper. This metrics catalogue can be ud to facilitate the future practice and rearch in the area of Cloud rvices evaluation. Moreover, considering metrics lection is a prerequisite of benchmark lection in evaluation implementations, this work also supplements the existing rearch in benchmark ing the commercial Cloud rvices.

Keyw ords- Cloud Computing; Commercial Cloud Service; Cloud Services Evaluation; Evaluation Metrics; Catalogue

I.I NTRODUCTION

C loud C omputing, as one of the most promising computing paradigms [1], has become increasingly accepted in industry. C orrespondingly, more and more commercial Cloud rvices offered by an increasing number of providers are available in the market [2, 5]. Considering that customers have little knowledge and control over the preci nature of commercial C loud rvices even in the “locke

d down” environment [3], evaluation of tho rvices would be crucial for many purpos ranging from cost-benefit analysis for Cloud Computing adoption to decision making for Cloud provider lection.

When evaluating C loud rvices, a t of suitable measurement criteria or metrics must be chon. In fact, according to the rich rearch in the evaluation of traditional computer systems, the lection of metrics plays an esntial role in evaluation implementations [32]. However, compared to the large amount of rearch effort into benchmarks for the Cloud [3, 4, 16, 21, 34, 45], to the best of our knowledge, there is not any systematic discussion about metrics for evaluating C loud rvices yet. C onsidering that the metrics lection is one of the prerequisites of benchmark lection [31], we propod to perform a comprehensive investigation into evaluation metrics in the Cloud Computing domain.

Unfortunately, in contrast with traditional computing systems, the C loud nowadays is still chaos [56]. The most outstanding issue is that there is a lack of connsus of standard definition of C loud C omputing, which inevitably leads to market hype and also skepticism and confusion [28]. As a result, it is hard to point out the range of C loud C omputing and a full scope of metrics for evaluating different commercial Cloud rvices. Therefore, we decided to unfold the investigation along a regre

ssion manner. In other words, we tried to isolate the de facto evaluation metrics from the existing evaluation work to help understand the state-of-the-practice of the metrics ud in Cloud rvices evaluation. When it comes to exploring the existing evaluation practices of C loud rvices, we employed three constraints:

x This study focud on the evaluation of only commercial C loud rvices, rather than that of

private or academic C loud rvices, to make our

effort clor to industry’s needs.

x This study concerned Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) without

considering Software as a Service (SaaS). Since

SaaS with special functionalities is not ud to

further build individual business applications [21],

the evaluation of various SaaS instances could

require infinite and exclusive metrics that would be

out of the scope of this investigation.

x This study only explored empirical evaluation practices in academic publications. There is no doubt

that informal descriptions of C loud rvices

evaluation in blogs and technical websites can also

provide highly relevant information. However, on

the one hand, it is impossible to explore and collect

uful data from different study sources all at once.

On the other hand, the published evaluation reports

can be viewed as typical and peer-reviewed

reprentatives of the existing ad hoc evaluation

practices.

Considering that the Systematic Literature Review (SLR) has been widely accepted as a standard and rigorous approach to evidence collection for investigating specific rearch questions [26, 27], we adopted the SLR method to identify, asss and synthesize the published primary studies

2012 ACM/IEEE 13th International Conference on Grid Computing

of Cloud rvices evaluation. Due to the limit of space, the detailed SLR process is not elaborated in this paper1. Overall, we have identified 46 relevant primary studies covering six commercial Cloud providers, such as Amazon, GoGrid, Google, IBM, Microsoft, and Rackspace, from a t of popular digital publication databas (all the identified primary studies have been listed online for reference: /groups/1104801/slr4cloud/papers /). More than 500 evaluation metrics including duplications were finally extracted from the identified C loud rvices evaluation studies.

This paper reports our investigation result. After removing duplications and differentiating metric types, the evaluation metrics were arranged according to different C loud rvice features covering the following aspects: Performance, Economics, and Security. The arranged result esntially constructed a catalogue of metrics for evaluating commercial Cloud rvices. In turn, we can u thi

s metrics catalogue to facilitate the C loud rvices evaluation work, such as quickly looking up suitable evaluation metrics, identifying current rearch gap and future rearch opportunities, and developing sophisticated metrics bad on the existing metrics.

The remainder of the paper is organized as follows. Section II arranges all the identified evaluation metrics under different Cloud rvice features. Section III introduces three scenarios of applying this metrics catalogue. C onclusions and some future work are discusd in Section IV.

II.T HE M ETRICS FOR C LOUD S ERVICES E VALUATION

It is clear that the choice of appropriate metrics depends on the rvice features to be evaluated [31]. Therefore, we naturally organized the identified evaluation metrics according to their corresponding C loud rvice features. In detail, the evaluated features in the reviewed primary studies can be found scattered over three aspects of Cloud rvices (namely Performance, Economics [35], and Security) and their properties. Thus, we u the following three subctions to respectively introduce tho identified metrics.

A.Performance Evaluation Metrics

In practice, an evaluated performance feature is usually reprented by a combination of a physical property of Cloud rvices and its capacity, for example C ommunication Latency, or Storage Reliability. Therefore, we divide a performance feature into two parts: Physical Property part and Capacity part. Thus, all the elements of performance features identified from the aforementioned primary studies can be summarized as shown in Figure 1. The detailed explanations and descriptions of different performance feature elements have been clarified in our previous taxonomy work [57]. In particular, Scalability and Variability are also regarded as two elements in the Capacity part, while further distinguished from the other capacities, 1The SLR report can be found online:n是什么意思英语

/open?id=0B9KzcoAAmi43LV9IaEgtNnVUenVX

Sy1FWTJKSzRsdw becau they are inevitably reflected by the changes in the index of normal performance features.

Naturally, here we display the performance evaluation metrics mainly following the quence of the performance elements. In addition, the evaluation metrics for overall performance of C loud rvices are particularly listed. The metrics for evaluating Scalability and Variability are also parated respectively.

Figure 1. Performance features of Cloud rvices for evaluation.

1)Communication Evaluation M etrics (cf. Table I): Communication refers to the data/message transfer between internal rvice instances (or different C loud rvices), or between external client and the C loud. In particular, given the parate discussions about IP-level and MPI-message-level networking among public C louds [e.g. 8], we also distinguished evaluation metrics between TCP/UDP/IP and MPI communications.

Brief descriptions of particular metrics in Table I:

x Packet Loss Frequency vs. Probe Loss Rate: Here we directly copied the names of the two metrics

from [43]. Packet Loss Frequency is defined as the

rate between loss_time_slot and total_time_slot, and

Probe Lost Rate is defined as the rate between

lost_probes and total_probes. C onsidering that the

concept Availability is driven by the time lost while

Reliability is driven by the number of failures [10],

we can find that the former metric is for

C ommunication Availability evaluation while the

latter is for Communication Reliability.

x Correlation between Total Runtime and Communication Time: This metric is to obrve a t

of applications about their runtime and the amount

of time they spend communicating in the Cloud. The

trend of the correlation can be ud to qualitatively

discuss the influence of C ommunication on the

applications running in the Cloud.

TABLE I. C OMMUNICATION E VALUATION M ETRICS

Capacity Metrics Benchmark Transaction

Speed Max Number of Transfer Sessions SPECweb 2005 [22] Availability Packet Loss Frequency Badabing Tool [43]

Latency Correlation between Total Runtime

and Communication Time Application Suite [30]

TCP/UDP/IP Transfer Delay

(s, ms)

CARE [45]

Ping [5]

Send 1 byte data [20]

Latency Sensitive

Website [5]

Badabing Tool [43]

MPI Transfer Delay

(s, μs)

HPCC: b_eff [42]

Intel MPI Bench [18]

mpptest [8]

OMB-3.1 with MPI [44]

Reliability Connection Error Rate CARE [45] Probe Loss Rate Badabing Tool [43]

Data Throughput TCP/UDP/IP Transfer bit/Byte

Speed (bps, Mbps, MB/s, GB/s)

iperf [5]

Private tools

TCPTest/UDPTest [43]

SPECweb 2005 [22]

Upload/Download/

Send large size data[23]

MPI Transfer bit/Byte Speed

(bps, MB/s, GB/s)

HPCC: b_eff [42]

Intel MPI Bench [18]

mpptest [8]

OMB-3.1 with MPI [44]

2)Computation Evaluation M etrics (cf. Table II):

C omputation refers to the computing-intensive data/job processing in the C loud. Note that, although coar-grain Cloud-hosted applications are generally ud to evaluate the overall performance of C loud rvices (e Subction 5)), the C PU-intensive applications have been particularly adopted for the specific Computation evaluation.

Brief descriptions of particular metrics in Table II:

x Benchmark Efficiency vs. Instance Efficiency: The two metrics both measure the real individual-

instance C omputation performance as a percentage

of a baline threshold. In Benchmark Efficiency, the

baline threshold is the theoretical peak of

benchmark result, while it is the theoretical C PU

peak in Instance Efficiency.

x ECU Ratio: This metric us Elastic C ompute Unit (ECU) instead of traditional FLOPS to measure the

Computation performance. An ECU is defined as the

CPU power of a 1.0-1.2 GHz 2007 Opteron or Xeon

processor [42].

x CPU Load: This metric is usually ud together with other performance evaluation metrics to judge

bottleneck features. For example, low CPU load with

maximum communication ssions indicate that data

transfer on EC2 c1.xlarge instance is the bottleneck

for a particular workload [22].

TABLE II. C OMPUTATION E VALUATION M ETRICS Capacity Metrics Benchmark

Transaction

Speed

Benchmark Efficiency

(% Benchmark Peak) HPL [42]

ECU Ratio (Gflops/ECU) HPL [42]

Instance Efficiency

(% CPU peak) HPL [17]

Benchmark OP (FLOP) Rate

(Gflops, Tflops)

DGEMM [30]

FFTE [30]

HPL [30]

LMbench [42]

NPB: EP [4]

Whetstone [39] Latency

Benchmark Runtime

(hr, min, s, ms)

Private benchmark/

application [6]

Compiling Linux Kernel [46]

Fibonacci [12]

DGEMM [17]

HPL [17]

NPB [41] Other

CPU Load (%) SPECweb 2005 [22]

Ubench CPU Score Ubench [47]

3)M emory (Cache) Evaluation M etrics (cf. Table III): Memory (C ache) is intended for fast access to temporarily saved data that can be achieved from slow-accesd hard drive storage. Since it could be hard to exactly distinguish the affect to performance brought by memory/cache, there are less ev

aluation practices and metrics for memory/cache than for other physical properties. However, in addition to normal capacity evaluation, there are some interesting metrics for verifying the memory hierarchies in C loud rvices, as elaborated below.

TABLE III. M EMORY (C ACHE)E VALUATION M ETRICS Capacity Metrics Benchmark Transaction

Speed

Random Memory Update

Rate (MUP/s, GUP/s) HPCC: RandomAccess [30] Latency

Mean Hit Time (s) Land Elevation Change App [13]

Memcache Get / Put /

Respon Time (ms) Operate 1Byte / 1MB data [12]

Data

Throughput

Memory bit/Byte Speed

(MB/s, GB/s)

CacheBench [42]

HPCC: PTRANS [30]

HPCC: STREAM [42]

Memory

Hierarchy

Intra-node Scaling

DGEMM [17]

HPL [17]

Sharp Performance Drop

(increasing workload)

Bonnie [42]

CacheBench [42] Other Ubench Memory Score Ubench [47]

Brief descriptions of particular metrics in Table III:

x Intra-node Scaling: This metric is relatively complex. It is ud to judge the position of cache

contention by employing Scalability evaluation

metrics (e Subction 6)). To obrve the scaling

capacity of a rvice instance, the benchmark is

executed repeatedly along with varying workload

and the number of ud CPU cores [17].

x Sharp Performance Drop: This metric is ud to find cache boundaries of the memory hierarchy in a

particular rvice instance. In detail, when repeatedly

executing the benchmark along with gradually

increasing workload, the major performance drop-

offs can roughly indicate the memory hierarchy sizes

[42].

4)Storage Evaluation Metrics (cf. Table IV): Storage of Cloud rvices is ud to permanently store urs’ data, until the data are removed or the rvices are suspended intentionally. C ompared to acessing Memory (C ache), accessing data permantently stored in Cloud rvices usually takes longer time.

TABLE IV. S TORAGE E VALUATION M ETRICS Capacity Metrics Benchmark

Transaction Speed One Byte Data Access Rate

(bytes/s) Download 1 byte data [38] Benchmark I/O

Operation Speed (ops) Bonnie/Bonnie++ [42] Blob/Table/Queue I/O

Operation Speed (ops)

Operate Blob/

Table/Queue Data[5] Performance Rate between

Blob & Table

Operate Blob & Table

Data [20]

Availability

题库Histogram of GET

Throughput (in chart)

Get data of 1Byte/100MB

[9]

Benchmark I/O Delay

(min, s, ms)

BitTorrent [38]25英语怎么读

Private benchmark/

application [6]

NPB: BT [4]

Blob/Table/Queue I/O

Operation Time (s, ms)

Operate Blob/

Table/Queue Data[5] Page Generation Time (s) TPC-W [5]

Reliability I/O Access Retried Rate Download Data [38] HTTP Get/Put [25]

Data Throughput Benchmark I/O bit/Byte Speed

(KB/s, MB/s)

Bonnie/Bonnie++ [42]

IOR in POSIX [44]

PostMark [7]

NPB: BT-IO [44] Blob I/O bit/Byte Speed

(Mbps, Bytes/s, MB/s) Operate Blob Data [38]

Brief descriptions of particular metrics in Table IV:

x One Byte Data Access Rate: Although the unit here ems for Data Throughput evaluation, this metric has been particularly ud for measuring Storage Transaction Speed. Contrasted with accessing large-

size files, the performance of accessing very small-

size data can be dominated by the transaction overheard of storage rvices [38].

x Blob/Table/Queue I/O Operation metrics: Although not all of the public C loud providers specify the definitions, the Storage rvices can be categorized into three types of offers: Blob, Table and Queue [5].

In particular, the typical Blob I/O operations are Download and Upload; the typical Table I/O

operations are Get, Put and Query; and the typical

Queue I/O operations are Inrt, Retrieve, and

Remove.

x Histogram of GET Throughput (in chart): Unlike the other traditional metrics, this metric is reprented as银行最新利率

a chart instead of a quantitative number. In this ca,

the Histogram vividly illustrates the changing of

GET Throughput during a particular period of time,

which intuitively reflects the Availability of a Cloud

rvice. Therefore, the Histogram chart here is also

regarded as a special metric, and so do the other

charts and tables in Subction 6) and 7).

5)Overall Performance Evaluation M etrics (cf. Table V): In addition to the performance evaluations of specific physical properties, there are also a large number of evaluations of the overall performance of commercial Cloud rvices. We consider an overall performance evaluation metric as long as it was intentionally ud for measuring the overall performance of Cloud rvices in the primary study.

Brief descriptions of particular metrics in Table V:

x Relative Performance over a Baline (rate): This metric is usually ud to standardize a t of

performance evaluation results, which can further

facilitate the comparison between tho evaluation

results. Note the difference between this metric and

the metric Performance Speedup over a Baline.

The latter is a typical Scalability evaluation metric,

as explained in Subction 6).

x Sustained System Performance (SSP): This metric us a t of applications to give an aggregate

measure of performance of a Cloud rvice [30]. In

香熏精油fact, we can find that two other metrics are involved

in the calculation of this metric: the Geometric Mean

of individual applications’ Performance per CPU

Core result is multiplied by the number of

computational cores.

x Average Weighted Respon Time (AWRT): By using the resource consumption of each request as

weight, this metric gives a measure of how long on

average urs have to wait to accomplish their

required work [33]. The resource consumption of

each request is estimated by multiplying the

机械摆钟request’s execution time and the required number of

Cloud rvice instances.

6)Scalability Evaluation M etrics (cf. Table VI): Scalability has been variously defined within different contexts or from different perspectives [20]. However, no matter under what definition, the evaluation

of C loud rvices’ Scalability inevitably requires varying workload and/or C loud resources. Since the variations are usually reprented into charts and tables, we treat the corresponding charts and tables also as special metrics. In fact, unlike evaluating other performance properties, the evaluation of Scalability (and also Variability) normally implies comparison among a t of data that can be conveniently organized in charts and tables.

TABLE V. O VERALL P ERFORMANCE E VALUATION M ETRICS Capacity Metrics Benchmark

Transaction Speed

Benchmark OP (FLOP) Rate

(Mflops, Gflops, Mops)

HPL [4]

GASOLINE [48]

青春歌曲NPB [4]

Benchmark Transactional Job

Rate

BLAST [52]

Sysbench on MySQL [3]

TPC-W [29]

WSTest [49] Geometric Mean of Serial NPB

Results (Mop/s) NPB [44] Relative Performance over a

Baline (rate)

MODIS Processing [15]

NPB [4] Sustained System Performance

(SSP) Application Suite [30] Performance per Client TPC-E [20] Performance per CPU Cycle

(Mops/GHz) NPB [4] Performance per CPU Core

(Gflops/core) Application Suite [30]

Availability Histogram of Average

Transaction Time

TPC-E [20]

Latency

Benchmark Delay

(hr, min, s, ms)

Broadband/Epigenome/

Montage [24]

CSFV [8]

FEFF84 MPI [48]

MapReduce App [47]

MCB Hadoop [50]

MG-RAST+BLAST [37]

MODIS Processing [15]

NPB-OMP/MPI [51]

WCD [23]

WSTest [49]

Benchmark Transactional Job

Delay

(min, s)

BLAST [5]

C-Meter [16]

MODIS Processing [15]

SAGA BigJob Sys [40]

TPC-E [20]

TPC-W [53] Relative Runtime over a Baline

(rate)

Application Suite [30]

SPECjvm2008 [5] Average Weighted Respon

Time (AWRT) Lublin99 [33]

Reliability Error Rate of DB R/W CARE [45]

Data Throughput DB Processing Throughput

(byte/c) CARE [45] BLAST Processing Rate

(Mbp/instance/day) MG-RAST + BLAST [37]

Brief descriptions of particular metrics in Table VI:

中药五味子x Aggregate Performance & Performance Degradation/Slowdown over a Baline: The two

metrics are often ud to reflect the Scalability of a

C loud rvice (or feature) when the rvice (or

feature) is requested with increasing workload.

Therefore, the Scalability evaluation here is from the

perspective of workload. x Performance Speedup over a Baline: This metric is often ud to reflect the Scalability of a C loud

rvice (or feature) when the rvice (or feature) is

requested for different amounts or capabilities of

C loud resources. Therefore, the Scalability

evaluation here is from the perspective of C loud

resource.

x Performance Degradation/Slowdown over a Baline: Interestingly, this metric can be intuitively

regarded as an opposite one to the above metric

Performance Speedup over a Baline. However, it

is more meaningful to u this metric to reflect the

Scalability of a Cloud rvice (or feature) when the

rvice (or feature) is requested to deal with different

amount of workload. Therefore, the Scalability

evaluation here is from the perspective of workload.

x Parallelization Efficiency E(n): Interestingly, this metric can be viewed as a “reciprocal” of the normal

Performance Speedup metric. T(n) is defined as the

time taken to run a job with n rvice instances, and

then E(n) can be calculated through T(1)/T(n)/n.

TABLE VI. S CALABILITY E VALUATION M ETRICS

Sample Metrics

[22] Aggregate Performance

[13] Performance Speedup over a Baline

[20] Performance Degradation/Slowdown over a Baline

[23] Parallelization Efficiency E(n)= T(1)/T(n)/n

[48] Reprentation in Single Chart (Column, Line, Scatter) [47] Reprentation in Separate Charts

[42] Reprentation in Table

7)Variability Evaluation Metrics (cf. Table VII): In the

context of C loud rvice evaluation, Variability indicates the extent of fluctuation in values of an individual performance property of a commercial C loud rvice. The variation of evaluation results can be caud by the performance difference of C loud rvices at different time and/or different locations. Moreover, even at the same location and time, variation may still exist in a cluster of rvice instances. Note that, similar to the Scalability evaluation, the relevant charts and tables are also regarded as special metrics.

Brief descriptions of particular metrics in Table VII:

x Average, M inimum, and M aximum Value together: Although the three indicators in this metric cannot be

individually ud for Variability evaluation, they can

still reflect the variation of a C loud rvice (or

feature) when placed together.

x Coefficient of Variation (COV): COV is defined as a ratio of the standard deviation (STD) to the mean of

evaluation results. Therefore, this metric has been

also directly reprented as STD/Mean Rate [5].

x Cumulative Distribution Function vs. Probability Density Function: Both metrics distribute the

probabilities of different evaluation results to reflect

the variation of a C loud rvice (or feature). In the

本文发布于:2023-05-22 08:09:01，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/729585.html

上一篇：注册公司注册资金多好还是少好

下一篇：2023年运输合同电子版运输合同简单版本(九篇)

标签：银行机械青春五味子利率摆钟歌曲中药

留言与评论（共有 0 条评论）