Evolutionary Continuous Optimization by Distribution Estimation with Variational Bayesian I

更新时间:2023-07-04 23:47:28 阅读：评论：0

Evolutionary Continuous Optimization by Distribution Estimation with Variational Bayesian Independent Component Analyzers

Mixture Model

Dong-Yeon Cho and Byoung-Tak Zhang

Biointelligence Labratory

School of Computer Science and Engineering

Seoul National University

Seoul151-742,Korea

{dycho,btzhang}@bi.snu.ac.kr

bi.snu.ac.kr

Abstract.In evolutionary continuous optimization by building and us-

ing probabilistic models,the multivariate Gaussian distribution and their

variants or extensions such as the mixture of Gaussians have been ud

popularly.However,this Gaussian assumption is often violated in many

real problems.In this paper,we propo a new continuous estimation of

distribution algorithms(EDAs)with the variational Bayesian indepen-

环球职业在线教育dent component analyzers mixture model(vbICA-MM)for allowing any

distribution to be modeled.We examine how this sophisticated density

estimation technique has inﬂuence on the performance of the optimiza-

tion by employing the same lection and population alternation schemes

recommend的用法

ud in the previous EDAs.Our experimental results support that the

prented EDAs achieve better performance than previous EDAs with

ICA and Gaussian mixture-or kernel-bad approaches.

1Introduction

In the variation operators,particularly mutations of the continuous evolutionary algorithms such as evolution strategy,probability distributions have been ud to generate new oﬀspring.Although one of the most popular distributions is Gaussian in continuous cas,we have no prior knowledge of which type of the probability distributions is suitable for the given problem and how the parameter values of the distributions are determined to guide further arches toward the optimal point.This is the reason why we study the eﬀective methods to estimate the probability distributions of the good solutions in the current population.For the last decade,many rearchers developed this kind of evolutionary optimiza-tion techniques called estimation of distribution algorithms(EDAs)[1]for both discrete and continuous cas.Detailed reviews of existing EDAs can be found in[2]and[3].

For the continuous cas,most of the previous EDAs are bad on the mul-tivariate Gaussian distribution and their variants or extensions although there X.Yao et al.(Eds.):PPSN VIII,LNCS3242,pp.212–221,2004.

c Springer-Verlag Berlin Heidelberg2004

Evolutionary Continuous Optimization by Distribution Estimation213 are a few exceptions such as m

arginal histogram models[4].In spite of the wide usage of the Gaussian distribution,we suﬀer from the size of covariance ma-trix as the dimensionality of the problem increas.This can be alleviated by ignoring the covariance ,employing diagonal covariance matrix). In fact,this is assumed in the simplest version of EDAs.However,it extremely cuts down theﬂexibility of the model.

The conditional factorization of a multivariate joint probability with condi-tional dependence assumptions can reduce the number of required parameters to deﬁne the probability distribution.It can also explicitly reprent the conditional dependencies among the variables of the problem by an acyclic directed graph. Larra˜n aga et al.[5]ud a probabilistic graphical model called Gaussian network which can be considered as a factorization tool.It decompos the probability distribution of the given problem into various factors or conditional probabilities. Bosman and Thierens[6]prent continuous ID E A instances which also exploit the fact that every multivariate joint probability distribution can be written as a conditional factorization.

We can readily implement this factorization by introducing latent or hidden variables.Latent variable model provides a powerful approach to probabilistic model-building by supplementing a t of directly obrved variables with addi-tional latent variables[7].By deﬁning a joint distribution over visible and latent variables,the corresponding distribution of the obrved variables is then ob-tained

by marginalization.This allows relatively complex distributions to be ex-presd in terms of more tractable joint distributions over the expanded variable space.In addition,it is easy to sample new data from the estimated distribution since latent variable models are generative.More comprehensive explanation and experimental results for our continuous EDAs bad on latent variable models can be found in[8]and[9].

Since the data often have the intricate structure which is diﬃcult to capture by a high-order-dependency model such as the Gaussian network,we should consider more complicated probability models.Clustering techniques can be an uful solution to alleviate the diﬃculties.After making some groups of similar data,we are able to u the existing probability models to obtain an appropriate interpretation for each cluster.Mixture models provide a natural way to handle the clusters in EDAs.They can also build very complex distributions through a proper choice of its components to reprent accurately the local areas of support of the true distribution.For continuous cas,the Gaussian mixture models have been successfully applied to the function optimization in[10]and[11].We also prented a mixture version of continuous EDAs with latent variable models which are corresponding to Gaussian mixtures[12].

As an extreme ca of the mixture models,Gaussian Kernels are ud in the ID E A frameworks[13].

The localized aspect of the kernel method is better than that of the normal distribution since there is a density function for each data point.However,it completely miss the linkage information and tends to quickly overﬁt the data.Therefore,the performance of the optimization is riously aﬀected by the density of sample points and kernel variances.To overcome the

214Dong-Yeon Cho and Byoung-Tak Zhang

diﬃculties,the mixed Bayesian optimization algorithm(MBOA)us a Bayesian network with decision trees who leaf nodes are one-dimensional Gaussian kernel estimators[14].MBOA is an extension of hierarchical Bayesian Optimization Algorithm(hBOA)[15]from binary to continuous domain.Actually,MBOA can solve both the discrete and continuous problems,but we concentrate on the continuous cas in this paper.

One of the drawbacks in the Gaussian mixture-or kernel-bad approaches is that each component or parated domain should be Gaussian.However,this cannot hold in many real cas.In this paper,we propo a new continuous EDAs with the variational Bayesian independent component analyzers mixture model (vbICA-MM)[16].It can not only model non-Gaussian,parated clusters by incorporating a veryﬂexible ICA model,but also automatically determine the local dimensi

onality of each cluster by employing recent developments in varia-tional Bayesian inference.We examine how this sophisticated density estimation technique has inﬂuence on the performance of the optimization by using the same population alternation schemes as the mixture version of iterated density estimation evolutionary algorithm(MID E A)and MBOA.

The paper is organized as follows.In Section2,the basic concept of vbICA-MM is explained.Section3prents the EDAs with vbICA-MM for the continu-ous domain.Then,Section4reports the experimental results and their analysis on some benchmark functions.Finally,conclusions of this study are drawn in Section5.

2Theoretical Background for vbICA-MM

2.1Independent Component Analysis Model

Independent component analysis(ICA)tries to model an obrved S-dimensional data vector x as a linear combination of statistically independent sources(latent variables)s of dimension L with added Gaussian noi

x=As+y+e,

where y is an S-dimensional bias vector,A is the S×L mixing matrix,and e is S-dimensional additive noi which is assumed to be zero-mean Gaussian and isotropic with precisionλI.Then,the conditional probability of obrving a data vector x n is given by

p(x n|A,s n,y,λ)=

2π

exp

−λ

(x n−As−y)T(x n−As−y)

Since the sources s={s1,s2,...,s L}are mutually independent by deﬁnition and a mixture of only1-dimensional Gaussians with m i components per source is adopted,the distribution over s for data point n can be written as

p(s n|ϕ)=

9495i=1

m ithis怎么读

q i=1

曲轴材料

πi,q

N(s n i;µi,q

,βi,q

英语口语文章

Evolutionary Continuous Optimization by Distribution Estimation 215where µi,q i and βi,q i are the mean and precision of Gaussian q i in source i respectively.The mixture proportions πi,q i are the prior probabilities of choosing component q i of the i th source.The parameters of source i are ϕi ={πi ,µi ,βi }and the complete parameter t of the source model is ϕ={ϕ1,ϕ2,...,ϕL }.This mixture of Gaussian source model allows any distribution to be modeled

[17].

The complete collection of possible source states is denoted q ={q 1,q 2,...,q m }and runs over all m = i m i possible combinations of source states.By integrating and summing over the hidden variables,{s ,q },the likelihood of the IID data X ={x 1,x 2,...,x N }given the model parameters Θ={A ,y ,λ,ϕ}can now be written as

p (X |Θ)=

N n =1m q =1 p (x n ,s n ,q n |Θ)d s ,(1)where d s =

i ds i .2.2ICA Mixture Model

ezbootWe suppo that a data vector x n is generated from a C -component mixture model given assumptions M .Then,the probability can be written in the form

p (x n |M )=C

c =1p (c |M 0)p (x n |M c ,c ),(2)看牙医英文

where M ={M 0,M 1,...,M C }is the vector of assumptions about the mixture process,M 0,and component model assumptions,M c .The variable c indicates which component of the mixture model is chon to generate a given data vector.

For the ICA mixture model,we specify a form for p (c |M 0)and substitute

(1)into (2).Then,we can quantiﬁes the likelihood of the obrved data under the ICA mixture model as follows

p (x n |M )=C

c =1p (c |κ)p (X |Θc ,c ),

where p (c |κ)={p (c =1)=κ1,p (c =2)=κ2,...,p (c =C )=κC }and Θc is the parameters for the c th ICA model.We can obtain the maximum likelihood (ML)estimation for the parameters as well as the values of the latent variables by using gradient descent or the EM algorithm.However,the ML methods can easily get caught in local maxima and fail to determine the best model structure since there is no consideration of model complexity.

2.3Variational Bayesian Learning for vbICA-MM

Bayesian approaches overcome the weakness of ML methods by integrating out the parameters {κ,Θc }and hidden variables {s c ,q c }and penalizing more com-plex models which can lead to overﬁtting.For this purpo,the prior distribu-tions over the model parameters and hidden variables are properly chon to be

216Dong-Yeon Cho and Byoung-Tak Zhang

conjugate to allow the tractability.A detailed statement for the priors over the ICA mixture models can

be found in[16].

Although Bayesian inference has the ability to handle the overﬁtting and lect the best model structure,it is often computationally intensive and ana-lytically intractable in practice.One of the uful tools for Bayesian integration is the variational approximation.We consider the log evidence for data X and form a lower bound on it using Jenn’s inequality:

log p(X)=log

d W p(X,W)≥

d W p (W)log

p(X,W)

p (W)

≡F[W],(3)

where W is the vector of all hidden variables and unknown parameters,and p (W)is a tractable appro

ximation to the posterior p(W|X).We can compare various models by calculating F for each model as well as implicitly integrate out the unknowns W by maximizing F.

For the tractable Bayesian learning of the ICA mixture model,the following factorization of the distribution of the parameters and hidden variables is ud: p (W)=p (c)p (s c|q c,c)p (q c|,c)p (κ)p (y)p (λ)p (A)p (α)p (ϕ),(4) where p (ϕ)=p (π)p (µ)p (β),p (a|b)is the approximating density of p (a|b,X), andαis the precision vector for each column of mixing matrix A who ele-ments have a zero-mean Gaussian as the prior distribution.The posteriors over the sources also factorize such that

p (s c|q c,c)=L c

i=1

菱形脸发型

p (q i|c)p (s c,i|q i,c).

This additional factorization allows eﬃcient scaling of computation with the number of latent variables.By substituting p(X,W)and(4)into(3),we obtain expressions for the bound,F,to the ICA mixture model.In order to know how the measure F is maximized,that is,how the vbICA-MM algorithm is implemented in detail,e[16].

3Continuous EDAs with vbICA-MMhays

3.1Distribution Estimation by vbICA-MM

Most evolutionary algorithms for continuous optimization problems maintain a population of real vectors to arch for an optimal point.The vbICA-MM can put similar individuals together in a group and estimate the density for each group simultaneously.This means that vbICA-MM implicitly divides the current population into C sub-populations andﬁnds the values of latent variables and parameters to build a corresponding density model for each sub-population by variational Bayesian learning.Unlike the mixture of Gaussian or other Gaussian-bad latent variable models,each cluster can have a non-Gaussian distribution

Evolutionary Continuous Optimization by Distribution Estimation217 in the vbICA-MM.This is becau the source model can enclo the distributions with positive and negative kurtosis and complex multimodal distributions.

Weﬁrst lect the good candidates from the current population as the data ud in the density estimation step.Both lection schemes of MID E A and MBOA were tried in our experiments.MID E A adopts the truncation lection where the best τN (0<τ<1)vectors are taken.In MBOA, τN vectors a

re chon by repeating the tournament lection(where the tournament size is two).

To estimate the probability distribution of the lected population,we u the MATLAB vbICA-MM software from Choudrey1.Although we can determine which model is preferred,that is,what is the best value for C by monitoring F according to the number of mixtures,we justﬁx the value of C in our experiments for the fast estimation.We also utilize the maximum number of sources,which means that the source vector has the same dimensions as the data(L=S). However,the relevance of each source may be automatically determined.Column i of the mixing matrix A c will be clo to zero if the precisionαi for the column is large,indicating source i is irrelevant.The ICA mixture model is trained until

F changed by less than0.01or the number of iterations reached10.

3.2Generating a New Population

To create a new individual˜x n from the trained model,weﬁrst have to deter-mine which component density c is responsible for the randomly chon parent x n from the lected vectors as the data.According to a probability proportional to the estimated component posterior probability,the component c is lected. Then,we can sample easily the new individual˜x n from the conditional distr

i-bution given the corresponding latent variable s n c deﬁned to be the Gaussian distribution N(A c s n c+y c,λc I).This sampling task is trivial since we can have the reconstructed values of hidden variables and parameters from the trained model and the noi is assumed to be zero-mean Gaussian and isotropic.As mentioned earlier,however,the source distribution is not a simple Gaussian but the factorized mixture of Gaussians,which makes the vbICA-MM moreﬂexible than Gaussian mixtures and the mixture version of other latent variable models. This sampling procedure is repeated until(N− τN )vectors are obtained.

Besides the way to estimate the density,a population alternation method is also important.We tried both schemes of MID E A and MBOA with vbICA-MM.The new sampled vectors replace the worst(N− τN )vectors in MID E A. To ensure an eﬀective niching,MBOA employs restricted tournament replace-ment(RTR)propod originally in[18].RTR lects a subt of the original population for each new sampled oﬀspring.The size of the subts isﬁxed to some constant,called the window size.Then,the new oﬀspring competes with the most similar member of the subt.If the new one is better,it replaces the corresponding individual;otherwi,the new one is discarded.

1[ac.uk/∼riz/Code/

本文发布于:2023-07-04 23:47:28，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/167255.html

上一篇：樊景立版的组织公民行为量表

下一篇：When does non-negative matrix factorization give a correct decomposition into parts

标签：职业材料菱形

留言与评论（共有 0 条评论）