Objective Reduction in Many-Objective Optimization:Linear and Nonlinear Algorithms Dhish Kumar Saxena,Jo˜a o A.Duro,Ashutosh Tiwari,Kalyanmoy Deb,Fellow,IEEE,and Qingfu Zhang
2014考研英语答案Abstract—The difficulties faced by existing multiobjective evolutionary algorithms(MOEAs)in handling many-objective problems relate to the inefficiency of lection operators,high computational cost,and difficulty in visualization of objective space.While many approaches aim to counter the difficulties by increasing thefidelity of the standard lection operators, the objective reduction approach attempts to eliminate objectives that are not esntial to describe the Pareto-optimal front(POF). If the number of esntial objectives is found to be two or three,the problem could be solved by the existing MOEAs. It implies that objective reduction could make an otherwi unsolvable(many-objective)problem solvable.Even when the esntial objectives are four or more,the reduced reprentation of the problem will have favorable impact on the arch efficiency, computational cost,and decision-making.Hence,development of generic and robust objective reduction approaches becomes important.This paper prents a principal component analysis and maximum variance unfolding bad framework for linear and nonlinear objective reduction algorithms,respectively.The major contribution of this paper includes:1)the enhancements in the core components of the framework for higher robust-ness in terms of applicability to a range of problems with disparate degree of redundancy;mechanisms
to handle input data that poorly approximates the true POF;and dependence on fewer parameters to minimize the variability in performance;
2)proposition of an error measure to asss the quality of results;
3)nsitivity analysis of the propod algorithms for the critical parameter involved,and the characteristics of the input data; and4)study of the performance of the propod algorithms vis-`a-vis dominance relation prervation bad algorithms,on a wide range of test problems(scaled up to50objectives)and two real-world problems.
Index Terms—Evolutionary multiobjective optimization,many-objective optimization,maximum variance unfolding and kernels, principal component analysis.
I.Introduction
O PTIMIZATION problems involving four or more con-flicting objectives are typically referred to as many-objective problems.With the multiple conflicting demands Manuscript received August26,2010;revid May29,2011and November 22,2011;accepted December14,2011.Date of publication February12, 2012;date of current version January25,2013.
D.K.Saxena,J. A.Duro,and A.Tiwari are with the Depart-ment of Manufacturing,Cranfield University,Cranfield,Bedford MK43 0AL,U.K.(e-mail: d.saxena@cranfield.ac.uk;j.a.duro@cranfield.ac.uk;
a.tiwari@cranfield.ac.uk).
K.Deb is with the Department of Mechanical Engineering,Indian Institute of Technology Kanpur,Kanpur208016,India(e-mail:deb@iitk.ac.in).
Q.Zhang is with the School of Computer Science and Electronic Engineering,University of Esx,Colchester CO43SQ,U.K.(e-mail: qzhang@esx.ac.uk).
Digital Object Identifier10.1109/TEVC.2012.2185847faced by the industry today for superior quality,low cost, higher safety,and so on,competitive edge could only be es-tablished by designing the products and process that account for as many performance objectives as possible.It implies that many objectives need to be simultaneously dealt with,in an optimization problem.While this recognition is growing, it is alarming that the arch ability of some of the most well-known multiobjective evolutionary algorithms(MOEAs) verely deteriorates when more than three objectives are in-volved[1]–[3],resulting in a poor Pareto-optimal front(POF)-approximation(converge
nce to and diversity across the POF). This makes evolutionary many-objective optimization one of the most challenging rearch areas in thefield of evolution-ary optimization and explains the growing rearch emphasis in this direction.The main difficulties associated with many-objective problems relate to the following.
1)High computational cost:if a continuous multiobjective
optimization problem(with M objectives)meets the regularity property,the dimension of its POF can be M−1[4].Therefore,the number of points needed to approximate the whole POF increa exponentially with M.The same phenomenon can be obrved in discrete problems.
2)Poor scalability of most existing MOEAs:with an in-
crea in M,almost the entire population acquires the same-rank of nondomination.This makes the Pareto-dominance bad primary lection ineffective and the role of condary lection bad on diversity prerva-tion becomes crucial.The density bad MOEAs(such as NSGA-II[5],SPEA2[6],and PESA[7])worn the situation by favoring the remote and boundary solutions, implying that the best diversity gets associated with poorly converged solutions.This explains the perfor-mance deterioration reported in[1]–[3].The more recent MOEA/D[8],[9]and indicator bad M
OEAs[10],[11] are found to cope better with an increa in M.For example,in SMS-EMOA[11],the problems faced by the density bad MOEAs are countered by the u of S-metric[12]bad condary lection.However,the utility of such MOEAs is limited by the fact that their running times increa exponentially with M[13]–[15].
3)Difficulty in visualization of a POF for problems with
M≥4,which in turn makes the task of decision making more difficult.Although some techniques[16],[17]exist to aid in visualization,they require a large number of solutions.
1089-778X/$31.00c 2012IEEE
In the context of many-objective problems,the above diffi-culties are often referred to as the cur of dimensionality. Here,it is important to discriminate between the cur of dimensionality caud by:1)the difficulties inherent in the problems, e.g.,problems with a high-dimensional Pareto-t with complicated shapes[18];and2)the poor scala-bility of most of the existing MOEAs for M≥4,even when the corresponding Pareto-t has simple shapes.For instance,a many-objective problem who Pareto-t is1-D or2-D with a linear or a quadratic shape,reprents a ca where the problem in itlf does not have the cur of dimensionality,yet most of the existing MOEAs may suffer from it.
The approaches for dealing with many-objective problems can be broadly categorized as follows.
1)Preference-ordering approaches:the approaches as-
sume that no objective in the given problem is redun-dant and aim to counter the low lection pressure for convergence by inducing a preference ordering over the nondominated solutions.While a review of the approaches may be found in[19],it may be noted that most of the approaches either aim for only a part of the POF,report an improvement in convergence with the loss in diversity,or their computational time increas exponentially with the number of objectives.
2)Objective reduction approaches:the approaches[20]–
customary
[24]assume the existence of redundant objectives in a
given M-objective problem.Operating on the objective vectors of the nondominated solutions obtained from an MOEA,the approaches aim to identify a smallest t of m(m≤M)conflicting objectives which generates the same POF as the original problem,by prerving:a) the correlation structure;or b)the dominance relations, of the original problem.Such m objectives are termed as esntial and the remaining ones as redundant.If m≤3, an otherwi unsolvable problem will become solvable.
Even if4≤m<M,objective reduction will contribute to higher arch efficiency,lower computational cost and ea in visualization and decision-making.
商誉减值This paper builds upon[23]and[24]and prents a frame-work for both linear and nonlinear objective reduction algo-rithms,namely,L-PCA and NL-MVU-PCA.The distinctive contribution of this paper relates to the following.
1)The four goals that the framework pursues are as fol-
lows.
a)Generality:the scope of[23]and[24]was limited
to highly redundant problems,where m M.The
scope of the propod framework is broadened
to include problems with moderate(m<M)
and negligible(m≈M)redundancy.This is
significant becau the core components of the
framework need to adapt to conflicting goals while
handling the above cas with different degree of
redundancy.
b)De-noising of the input data:the propod frame-
work accounts for the fact that the input data may学习电脑那个好
be ,the objectives that may be correlated
on the true POF,may show partial conflict in the
nondominated solutions obtained from an MOEA.
To negate the effect of such noi,the framework
(unlike[23]and[24])propos an eigenvalue
bad problem-specific computation of a corre-
lation threshold,and pairs of objectives who
strength of correlation exceeds the threshold,are
considered as correlated.
c)Parameter reduction:compared to[23]and[24],
chink in the armorthe number of parameters involved are signifi-
cantly fewer in the propod framework,which
makes it more robust.
d)Proposition of an error measure:an error measure
has been propod(not in[23]and[24])that
allows an asssment of the quality of the obtained
results.
2)Extensive simulations and results:the results prented
in this paper are new and are bad on over9000sim-ulations1performed on24versions of6test problems and two real-world problems.In this,the nsitivity of the propod algorithms on:a)the randomness;b)the critical parameter involved;and c)the characteristics of the underlying nondominated ts,is discusd.
3)Performance comparison:here,the performance of the
propod algorithms vis-`a-vis dominance relation prer-vation[20]bad algorithms is studied,with reference to their general strengths and limitations,and computa-tional complexity.
This paper is organized as follows:the fundamental issues in objective reduction and the existing approaches are discusd in Section II.Section III prents the rationale for viewing of the objective reduction problem as a machine learning problem,while Section IV prents the propod framework. The test suite and the experimental ttings are defined in Section V.The working of the propod framework is demon-strated in Section VI,while Sections VII and VIII prent the results fo
r a wider range of test problems.Section IX discuss the issue of customization of the framework and its parameter nsitivity.The real-world problems are covered in Section X,while a comparative analysis of the framework and the dominance relation prervation bad objective reduction algorithms is done in Section XI.The paper concludes with Section XII.
II.Objective Reduction:Definition,Ufulness, Salient Issues,and Existing Approaches
Objective reduction refers tofinding an esntial objective t for a given problem.In that:
1)an esntial objective t is defined as the smallest
t of conflicting objectives(F T,|F T|=m)which can generate the same POF as that by the original problem, given by F0={f1,f2,...,f M};
1The propod L-PCA and NL-MVU-PCA are tested for26test cas, corresponding to solutions generated—on the POF,randomly,and by NSGA-II and -MOEA(20runs each).Simulations with the latter two solution ts are repeated for three different parameter ttings of the propod algorithms,and also for dominance relation prervation bad objective reduction algorithms.
SAXENA et al.:OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION
79
Fig.1.DTLZ5(3,5):illustrating that the prence of redundant objectives can hinder the arch efficiency of a MOEA.The N N S corresponds to a population size of200and2000generations(one run).(a)N N S versus true POF.(b)N N S:parallel coordinate plot.
2)the dimensionality of the problem refers to the number
of esntial objectives2(m,m≤M);
3)a redundant objective t(F redn=F0\F T)refers to the
t of objectives,elimination of which does not affect the POF of the given problem.Notably,an objective could be redundant if it is nonconflicting(or correlated)with some other objectives.
The benefits of objective reduction could be realized by first recognizing the difficulties that the prence of redundant objectives may cau for most existing MOEAs.This is illustrated through the performance of NSGA-II on afive-objective problem with a3-D POF,namely,DTLZ5(3,5). While this problem is an instance of the DTLZ5(I,M)problem introduced later,here it is sufficient to note that its POF is characterized as follows:1)f1–f2–f3are perfectly correlated (do not lead to incomparable solutions);and2)an esntial objective t is given by,F T={f3,f4,f5}.
NSGA-II is known to perform well for problems with2-D and3-D POF.However,it can be en in Fig.1(a)that in the ca of DTLZ5(3,5),NSGA-II solutions(N N S)could not converge to the3-D POF.This can be attributed to the prence of redundant objectives,as explained below.In the F T subspace shown in Fig.1(a),solution A dominates B. However,B remains a nondominated solution as it is better than A in one of the redundant objectives,namely,f1or f2.This illustrates that the prence of redundant objectives hinders the arch efficiency of NSGA-II and leads to a poor POF-approxim
ation,in that,the dominance relations in N N S may be different from tho characterizing the POF.Fig.1(b) corroborates this fact,where a significant proportion of N N S (spurious solutions like B)is a result of the conflict between f1–f2–f3,even though the objectives are correlated on the POF.If NSGA-II were allowed to run for infinitely many gen-erations,the spurious solutions may“die out”and hopefully, the population could converge to the true POF.However,with a reasonable number of generations,the spurious solutions will prevail.This example highlights the potential benefits of objective reduction.In that:
1)if the problem could be reduced to m≤3:an otherwi
unsolvable problem could be solved using any of the existing MOEAs.For example,in the above problem,if 2Under regularity condition[4],for m conflicting objectives,the dimension-ality of the POF is m−1.However,in the current context,it is meant to refer to the number of esntial objectives.
F T were known,NSGA-II would have converged to the
true POF;
2)even if4≤m<M:the benefits of objective reduc-
tion could be realized in the form of relatively higher arch efficiency,lower computational cost and ea in visualization and decision-making;
3)objective reduction could play a complimentary role for
the preference-ordering approaches.If the latter were to be applied to the reduced reprentation of the original problem,higher gains can be expected;
4)besides the above,the benefits of the gain in knowledge
about the problem itlf,cannot be discounted in real-world problems.This is significant becau quite often, there is a tendency among practitioners to account for all the performance indicators as objectives,while it may not be intuitive if some of them are correlated.
It is important to highlight some salient issues around objective reduction,before introducing the existing approaches.First, that all the approaches operate on the objective vectors of the nondominated solutions obtained from a MOEA.Second, that objective reduction can be performed during an MOEA run(online reduction)to simplify the arch or it can be performed post MOEA run(offline reduction)to assist in the decision making.Third,that an esntial objective t may be exp
resd as a subt of the original problem or its elements may be expresd as a linear combination of the original objectives.While the former approach is referred to as feature lection,the latter,as feature extraction.No-tably,most of the existing objective reduction approaches pursue feature lection toward the ea in subquent decision making.
In the wake of the above background,the existing objective reduction approaches are prented below.
1)Dominance relation prervation(DRP):the objective
reduction approach propod in[20]is bad on prerv-ing the dominance relations in the given nondominated solutions.For a given objective t F,if the dominance relations among the objective vectors remain unchanged when an objective f∈F is removed,then f is considered to be nonconflicting with the other objectives in F.Bad on this criterion,an exact and a greedy algorithm is propod to addressδ-MOSS(finding the minimum objective subt corresponding to a given error δ)and k-EMOSS(finding an objective subt of size k with minimum possible error)problems.
2)Unsupervid feature lection:this approach[21]uti-
lizes the correlation between objectives and treats the more distant objectives as more conflicting ones.In this:
a)the objective t is divided into neighborhoods of
size q around each objective;and b)the most compact neighborhood is lected,who center is retained and the neighbors are eliminated.This process is repeated until a pre-specified criterion is achieved.Bad on this, two algorithms have been propod to deal withδ-MOSS and k-EMOSS problems,respectively.
3)Pareto corner arch:a Pareto corner arch evolutionary
algorithm has been propod in[22],which instead
80IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION,VOL.17,NO.1,FEBRUARYdry是什么意思
squat
2013
Fig.2.Maximum variance unfolding(taken from[28]).
of aiming for the complete POF,arches for only the corners of the POF.The solutions so obtained are assumed to appropriately capture the dependency of the POF on different objectives.Then objective reduction is bad on the premi that omitting a redundant and an esntial objective will have negligible and substantial effect,respectively,on the number of nondominated solutions in the population.
4)Machine learning bad objective reduction:this ap-
proach[23],[24]utilizes the machine learning tech-niques like principal component analysis(PCA)and maximum variance unfolding(MVU)to remove the cond-order and higher-order dependences,respec-tively,in the nondominated solutions.As this approach lies at the heart of the framework propod in this paper, the basic concepts are discusd below.
III.Machine Learning Bad Objective Reduction This approach is bad on the premi that the intrinsic structure of a garbled high-dimensional data can be revealed by transforming it such that the effect of noi and redundancy (dependences)is minimized.PCA[25]achieves this goal by projecting the given data X such that its correlation structure is most faithfully prerved,in that,PCA projects X on the eigenvectors of the correlation matrix of X.
Notably,PCA is bad on removing the cond order dependences in the given data.Hence,PCA is likely to be ineffective in capturing the structure of data ts with multimodal Gaussian or non-Gaussian[25].Several nonlinear dimensionality reduction methods,such as kernel PCA[26] and graph-bad methods[27],exist for removing the higher order dependences.In that,the former nonlinearly trans-forms the data by using a standard kernel function and then applies PCA in the tran
sformed/kernel space.However,its success depends on the a priori chon kernel.The latter does away with this limitation by deriving“data-dependent”kernels.
MVU[28]is a graph-bad method that computes the low-dimensional reprentation by explicitly attempting to “unfold”the high-dimensional data manifold,as in Fig.2.The unfolding is achieved by maximizing the Euclidean distances between the data points while locally prerving the distances and angles between nearby points.Mathematically,this can be pod as a midefinite programming(SDP)problem[28], the output of which is the kernel matrix(say,K)reprenting the kernel space to which PCA can be applied.
In the wake of the concepts above,and an MOEA’s arch characteristics highlighted in Section II,the objective reduc-tion problem can be viewed as a machine learning problem. In that:
1)the intrinsic structure of the POF refers to its intrinsic
dimensionality(m)and the esntial components(F T);
2)the garbled high-dimensional data refers to the non-
dominated solutions obtained from an MOEA,which typically provide a poor POF-approximation,and
hence, are characterized by dominance relations that may be different from tho characterizing the POF.In that,the objectives that are perfectly correlated on the POF,may show partial conflict in the given MOEA solutions; 3)unnoid signal refers to the exact-optimal ,
the nondominated solutions which are characterized by the same dominance relations as tho which define the true POF[for example,the fraction of N N S in Fig.1(a), which conformed with the true POF];
4)noid signal refers to the nonoptimal solutions,char-
acterized by dominance relations that may be different from tho which define the true POF[for example,the fraction of N N S in Fig.1(a)which did not conform with the true POF].Noi refers to the difference in the characteristics of the unnoid and noid signal, for example,the difference in the dimensionality(m) of unnoid signal and that of the noid signal(which could be greater than m);
5)redundancy refers to the prence of objectives which
are nonconflicting(or correlated)with some other ob-jectives.Bad on discussions in Section II,it can be inferred that redundancy may contribute to garbled data. The above conformance in the definition
s and terminology justifies the viewing of objective reduction as a machine learning problem.
IV.Propod Framework for Linear and
Nonlinear Objective Reduction
This ction propos the framework for offline linear and nonlinear objective reduction algorithms,namely,L-PCA and NL-MVU-PCA,respectively.Given a t of nondominated solutions for an M-objective problem,this framework(Frame-work1)aims tofind a smallest t of m(m≤M)conflicting objectives which prerves the correlation structure of the given solution t.This is achieved by eliminating objectives that are nonconflicting along the significant eigenvectors of the correlation matrix(or the kernel matrix)or are globally correlated.Hence,if the correlation structure of the given non-dominated solutions conforms with the correlation structure of the POF,3then:1)such solutions are considered as good reprentative of the POF;and2)the smallest t of conflicting objectives determined by the framework will constitute an esntial objective t(Section II)for the problem.
While the preliminary proof-of-concept of this approach can be found in[23]and[24],this paper is novel in multiple ways, 3This is discusd in Sections V-A and V-B with respect to each test problem ud in this paper,and in Sections VI to VIII on experimental results.
SAXENA et al.:OBJECTIVE REDUCTION IN MANY-OBJECTIVE OPTIMIZATION 81
Algorithm 1The propod framework for linear and nonlinear objective reduction algorithms t =0and F t ={f 1,f 2,...,f M }.begin 1Obtain a t of nondominated solutions by running
2an MOEA corresponding to F t ,for N g generations with a population size of N .
Compute a positive mi-definite matrix:R (1)or K 3
(2),for L-PCA and NL-MVU-PCA,respectively.Compute the eigenvalues and eigenvectors of R and 4
K respectively (Section IV-B).
Perform the Eigenvalue Analysis (Section IV-C)to 5
identify the t of important objectives F e ⊆F t .Perform the Reduced Correlation Matrix Analysis
6(Section IV-D)to identify the identically correlated subts (S )in F e .If there is no such subt,F s =F e .Apply the lection scheme (Section IV-E)to identify 7
the most significant objective in each S ,to arrive at F s ,such that F s ⊆F e ⊆F t .Compute and store
E t (6).8
if F s =F t then 9Stop and declare F t as the esntial objective t;
10Set T =t and compute the total error E T (7).11
end 12el
13t t =t +1,F t =F s ,and go to Step 2.14end 1516
as summarized in Section I.In that,besides the extensive range of results and their analysis,the propod framework distinc-tively pursues the four goals of generality ,de-noising of input data,parameter reduction and proposition of an error measure .The goals are achieved through conceptual enhancements which are highlighted in the description of the framework’s steps below,and also summarized in Section IV-G.
一建合格标准As can be en in Framework 1,it is designed to perform objective reduction iteratively,until the objective ts deduced as esntial in two successive iterations,remain the same.As the framework’s steps are identical for every iteration,they are described below with reference to the first iteration.A.Construction of a Positive Semidefinite Matrix
This step describes the construction of the correlation (R )and the kernel (K )matrix,to be ud for linear and nonlin-ear objective reduction,respectively.First,the nondominated solutions are obtained by running an MOEA with the initial objective t F 0={f 1,...,f M },corresponding to a popula-tion size of N .Let the objective vector in the nondominated t,corresponding to the i th objective (f i )be denoted by ˙f i ∈R N and its mean and standard deviation by μ˙f i and委托行
σ˙f i ,respectively.Furthermore,let ¨f i =(˙f i −μ˙f i )/σ˙f i .Then,
the input data X ,an M ×N matrix,can be compod as
X =[¨f 1¨¨f M ]T .
1)Construction of Correlation Matrix for Linear Objective
Reduction (L-PCA):For a given X ,the correlation matrix R is defined by (1).It may be noted that L-PCA will be bad on de-correlation of R ,i.e.,removal of cond-order dependences in X .Hence,the scope of L-PCA will largely be limited to linear objective reduction
R =
1
M
XX T .(1)
2)Construction of Kernel Matrix for Nonlinear Objective
Reduction (NL-MVU-PCA):It may be noted that NL-MVU-PCA will be bad on de-correlation of the kernel matrix K ,i.e.,removal of higher-order dependences in X .This will allow for nonlinear objective reduction.K can be learnt from the SDP formulation 4prented in
Maximize trace (K )= ij (K ii −2K ij +K jj )
2M subject to the following constraints:a)K ii −2K ij +K jj =R ii −2R ij +R jj ,∀ηij =1;b)
ij K ij =0;
c)K is positive-midefinite;where R ij is the (i,j )th element of the correlation matrix R
ηij =
1,if ˙f
万物简史
j is among the q -nearest neighbor of ˙f i 0,otherwi.
(2)
The neighborhood relation ηij in (2)is governed by the
parameter q .For each input feature (˙f
i ∈R N ),q reprents the number of neighbors with respect to which the local isometry 5is to be retained during the unfolding,as in Fig.2.In other words,ηij ∈{0,1}indicates whether there is an edge between ˙f
i and ˙f j in the graph formed by pairwi connecting all q -nearest neighbors.While a high value of q ensures better retention of isometry,it delays the unfolding process [say,incremental unfolding].In contrast,a low value of q offers fast unfolding but at the risk of distorting the isometry.Given this tradeoff,proper lection of q is crucial.While q =4is mostly ud in the literature [28],experiments were conducted in [24]to identify the q as a function of M and q = √M was found to offer accurate results.
This paper aims to propo a robust framework for ob-jective reduction which relies on as few param
eters as pos-sible.Toward it,the most constrained ca of q =M −1is recommended and ud.The rationale for this choice of q is the following:1)the number of unknown ele-ments in K equals M (M +1)/2(accounting for symme-try);2)2b)and c)jointly reprent two constraints;and 3)for q =M −1,2a)leads to M (M −1)/2constraints.Hence,even with q =M −1,the matrix K is not fully specified by the constraints and sufficient degree of freedom (M (M +1)/2−M (M −1)/2−2=M −2)is available for
4This
is the novel implementation of the original MVU,as propod in [24].
5Two data ts {x i }M i =1and {y i }M i =1that are in one-to-one correspondence are
said to be q -locally isometric if:for every point x i ,there exists a rotation and translation that maps x i and its q -nearest neighbors {x i 1,...x iq }precily onto y i and {y i 1,...y iq }.In other words,the distances and angles between nearby inputs are prerved.