首页 > 英语园地

Emergence of pha and shift invariant features by decomposition of natural images into ind

更新时间:2023-07-04 23:58:43 阅读：评论：0

Emergence of pha and shift invariant features by decomposition of natural images into independent feature subspaces

Aapo Hyvärinen and Patrik Hoyer

Helsinki University of Technology

Laboratory of Computer and Information Science

P.O.Box5400,FIN-02015HUT,Finland

aapo.hyvarinen@hut.fi,patrik.hoyer@hut.fi

Neural Computation12(7):1705-1720[July,2000]

Abstract

Olshaun and Field(1996)applied the principle of independence maximization by spar coding to extract features from natural images.This leads to the emergence of oriented linearﬁlters that have simultaneous localiza-

tion in space and in frequency,thus rembling Gabor functions and simple cell receptiveﬁelds.In this paper,we

show that the same principle of independence maximization can explain the emergence of pha and shift invariant

features,similar to tho found in complex cells.This new kind of emergence is obtained by maximizing the in-

dependence between norms of projections on linear subspaces(instead of the independence of simple linearﬁlter

outputs).The norms of the projections on such’independent feature subspaces’then indicate the values of invariant

features.

1Introduction

A fundamental approach in signal processing is to design a statistical generative model of the obrv

ed signals.Such an approach is also uful for modeling the properties of neurons in primary nsory areas.Modeling visual data by a simple linear generative model,Olshaun and Field(1996)showed that the principle of maximizing the sparness (or nongaussianity)of the underlying image components is enough to explain the emergence of Gabor-likeﬁlters that remble the receptiveﬁelds of simple cells in mammalian primary visual cortex(V1).Maximizing sparness is in this context equivalent to maximizing the independence of the image components(Comon,1994;Bell and Sejnowski,1997;Olshaun and Field,1996).We show in this paper that this same principle can also explain the emergence of pha and shift invariant he principal properties of complex cells in V1.Using the method of feature subspaces(Kohonen,1995;Kohonen,1996),we model the respon of a complex cell as the norm of the projection of the input vector(image patch)onto a linear subspace,which is equivalent to the classical energy models.Then we maximize the independence between the norms of such projections,or energies.Thus we obtain features that are localized in space,oriented,and bandpass(lective to scale/frequency),like tho given by simple cells,or Gabor analysis.In contrast to simple linearﬁlters,however,the obtained features also show emergence of pha invariance and(limited)shift or translation invariance.Pha invariance means that the respon does not depend on the Fourier-pha of the stimulus:the respon is the same for a white bar and a black ba

r,as well as for a bar and an edge.Limited shift invariance means that a near-maximum respon can be elicited by identical

bars or edges at slightly different locations.The two latter properties cloly parallel the properties that distinguish complex cells from simple cells in V1.Maximizing the independence,or equivalently,the sparness of the norms of the projections to feature subspaces thus allows for the emergence of exactly tho invariances that are encountered in complex cells,indicating their fundamental importance in image data.

2Independent component analysis of image data

The basic models that we consider here express a static monochrome image I(x,y)as a linear superposition of some

features or basis functions b i(x,y):

I(x,y)=

∑

i=1

b i(x,y)s i(1)

杨桃的英文where the s i are stochastic coefﬁcients,different for each image I(x,y).The crucial assumption here is that the s i are nongaussian,and mutually independent.This type of decomposition is called independent component analysis (ICA)(Comon,1994;Bell and Sejnowski,1997;Hyvärinen and Oja,1997),or from an alternative viewpoint,spar coding(Olshaun and Field,1996;Olshaun and Field,1997).

Estimation of the model in Eq.(1)consists of determining the values of s i and b i(x,y)for all i and(x,y),given a sufﬁcient number of obrvations of images,or in practice,image patches I(x,y).We restrict ourlves here to the basic ca where the b i(x,y)form an invertible linear system.Then we can invert the system as

s i=<w i,I>(2) where the w i denote the inverﬁlters,and<w i,I>=∑x,y w i(x,y)I(x,y)denotes the dot-product.The w i(x,y)can then be identiﬁed as the receptiveﬁelds of the model simple cells,and the s i

are their activities when prented with a given image patch I(x,y).Olshaun and Field(1996)showed that when this model is estimated with input data consisting of patches of natural scenes,the obtainedﬁlters w i(x,y)have the three principal properties of simple cells in V1:they are localized,oriented,and bandpass.Van Hateren and van der Schaaf(1998)compared quantitatively the obtainedﬁlters w i(x,y)with tho measured by single-cell recordings of the macaque cortex,and found a good match for most of the parameters.

3Decomposition into independent feature subspaces

3.1Introduction

In addition to the esntially linear simple cells,another important class of cells in V1is complex cells.Complex cells share the above-mentioned properties of simple cells but have the two principal distinguishing properties of pha invariance and(limited)shift invariance(Hubel and Wiel,1962;Pollen and Ronner,1983),at least for the preferred orientation and frequency.Note that although invariance with respect to shift and global Fourier pha are equivalent, they are different properties when pha is computed from a local Fourier transform.Another distinguishing property of complex cells is that the receptiveﬁelds are larger than in simple cells,but this difference is only quan

titative,and of less conquence here.For more details (Heeger,1992;Pollen and Ronner,1983;Mel et al.,1998). To this date,very few attempts have been made to formulate a statistical model that would explain the emergence of the properties of visual complex cells.It is simple to e why ICA as in Eq.(1)cannot be directly ud for modeling complex cells.This is due to the fact that in that model the activations of the neurons s i can be ud to linearly reconstruct the image I(x,y),which is not true for complex cells due to their two principal properties of

pha invariance and shift invariance:The respons of complex cells do not give the pha or the exact position of the stimulus,at least not as a linear function as in Eq.(1).(See(von der Malsburg et al.,1998)for a nonlinear reconstruction of the image from complex cell respons.)

The purpo of this paper is to explain the emergence of pha and shift invariant features using a modiﬁcation of the ICA model.The modiﬁcation is bad on combining the technique of multidimensional independent component analysis(Cardoso,1998)and the principle of invariant-feature subspaces(Kohonen,1995;Kohonen,1996).Weﬁrst describe the two recently developed techniques.

3.2Invariant feature subspaces

The classical approach for feature extraction is to u linear transformations,orﬁlters.The prence of a given feature is detected by computing the dot-product of input data with a given feature vector.For example,wavelet, Gabor,and Fourier transforms,as well as most models of V1simple cells,u such linear features.The problem with linear features is,however,that they necessarily lack any invariance with respect to such transformations as spatial shift or change in(local)Fourier pha(Pollen and Ronner,1983;Kohonen,1996).

Kohonen(1996)developed the principle of invariant-feature subspaces as an abstract approach to reprenting features with some invariances.The principle of invariant-feature subspaces states that one may consider an invariant feature as a linear subspace in a feature space.The value of the invariant,higher-order feature is given by(the square of)the norm of the projection of the given data point on that subspace,which is typically spanned by lower-order features.

A feature subspace,as any linear subspace,can always be reprented by a t of orthogonal basis vectors,say w i(x,y),i=1,...,n,where n is the dimension of the subspace.Then the value F(I)of the feature F with input vector

I(x,y)is given by

F(I)=

∑

i=1

<w i,I>2(3)

(For simplicity of notation and terminology,we do not distinguish clearly the norm and the square of the norm in this paper).In fact,this is equivalent to computing the distance between the input vector I(x,y)and a general linear combination of the basis vectors(ﬁlters)w i(x,y)of the feature subspace(Kohonen,1996).A graphical depiction of feature subspaces is given in Fig.1.

In(Kohonen,1996),it was shown that this principle,when combined with competitive learning techniques,can lead to emergence of invariant image features.

item是什么意思3.3Multidimensional independent component analysis

In multidimensional independent component analysis(Cardoso,1998),a linear generative model as in Eq.(1)is assumed.In contrast to ordinary ICA,however,the components(respons)s i are not assumed to be all mutually independent.Instead,it is assumed that the s i can be divided into couples,triplets or in general n-tuples,such that the s i inside a given n-tuple may be dependent on each other,but dependencies between different n-tuples are not allowed.

Every n-tuple of s i corresponds to n basis vectors b i(x,y).We call a subspace spanned by a t of n such basis vectors an’independent(feature)subspace’.In general,the dimensionality of each independent subspace need not be equal,but we assume so for simplicity.

The model can be simpliﬁed by two additional assumptions.First,even though the components s i are not all in-dependent,we can always deﬁne them so that they are uncorrelated,and of unit variance.In fact,linear dependencies inside a given n-tuple of dependent components could always be removed by a linear transformation.Second,we can

assume that the data is whitened(sphered);this can be always accomplished PCA(Comon,1994).Whitening is a conventional preprocessing step in ordinary ICA,where it mak

英语励志短文es the basis vectors b i orthogonal(Comon,1994; Hyvärinen and Oja,1997),if we ignore anyﬁnite-sample effects.

The two assumptions imply that the b i are orthonormal,and that we can take b i=w i as in ordinary ICA with whitened data.In particular,the independent subspaces become orthogonal after whitening.The facts follow directly from the proof in(Comon,1994),which applies here as well,due to our above assumptions.

Let us denote by J the number of independent feature subspaces,and by S j,j=1,...,J the t of the indices of the s i belonging to the subspace of index j.Assume that the data consists of K obrved image patches I k(x,y),k= 1,...,K.Then we can express the likelihood L of the data given the model as follows

L(I k(x,y),K;w i(x,y),)

∏

k=1

哈利波特7在线阅读

[|det W|

∏

j=1

p j(<w i,I k>,i∈S j)](4)

where p j(.),which is a function of the n arguments<w i,I k>,i∈S j,gives the probability density inside the j-th n-tuple of s i,and W is a matrix containing theﬁlters w i(x,y)as its columns.The term|det W|appears here as in any expression of the probability density of a transformation,giving the change in volume produced by the linear transformation,(Pham et al.,1992).

The n-dimensional probability density p j(.)is not speciﬁed in advance in the general deﬁnition of multidimen-sional ICA(Cardoso,1998).

3.4Combining invariant feature subspaces and independent subspaces

Invariant-feature subspaces can be embedded in multidimensional independent component analysis by considering probability distributions for the n-tuples of s i that are spherically depend only on the norm.In other words,the probability density p j(.)of the n-tuple with index j∈{1,...,J},can be expresd as a function of the sum of the squares of the s i,i∈S j only.For simplicity,we assume further that the p j(.)are equal for all for all subspaces.

网络字典This means that the logarithm of the likelihood L of the he K obrved image patches I k(x,y),k= 1,...,K,given the model,can be expresd as

log L(I k(x,y),K;w i(x,y),)

∑

k=1

∑

j=1

log p(∑

i∈S j

<w i,I k>2)+K log|det W|(5)

where p(∑i∈S

j s2i)=p j(s i,i∈S j)gives the probability density inside the j-th n-tuple of s i.

Recall that prewhitening allows us to consider the w i(x,y)to be orthonormal,which implies that log|det W| is zero.This shows that the likelihood in Eq.(5)is a function of the norms of the projections of I k(x,y)on the subspaces indexed by j,which are spanned by the orthonormal basis ts given by w i(x,y),i∈S j.Since the norm of the projection of visual data on practically any subspace has a supergaussian distribution,we need to choo the probability density p in the model to be spar(Olshaun and Field,1996),i.e.supergaussian(Hyvärinen and Oja, 1997).For example,we could u the following probability distribution:

log p(∑

i∈S j s2i)=−α[∑

i∈S jbozo

s2i]1/2+β,(6) 4

which could be considered a multi-dimensional version of the exponential distibution(Field,1994).The scaling constantαand the normalization constantβare determined so as to give a probability density that is compatible with the constraint of unit variance of the s i,but they are irrelevant in the following.Thus we e that the estimation of the model consists ofﬁnding subspaces such that the norms of the projections of the(whitened)data on tho subspaces have maximally spar distributions.

The introduced’independent feature subspace analysis’is a natural generalization of ordinary ICA.In fact,if the projections on the subspaces are reduced to projections on1-D subspaces,the model reduces to ordinary ICA,provided that,in addition,the independent components are assumed to have non-skewed distributions. It is to be expected that the norms of the projections

on the subspaces reprent some higher-order,invariant features. The exact nature of the invariances has not been speciﬁed in the model but will emerge from the input data,using only the prior information on their independence.

When independent feature subspace analysis is applied to natural image data,we can identify the norms of the

projections(∑i∈S

j s2i)1/2as the respons of the complex cells.If the individualﬁlter vectors w i(x,y)are identiﬁed

with the receptiveﬁelds of simple cells,this can be interpreted as a hierarchical model where the complex cell respon is computed from simple cell respons s i,in a manner similar to the classical energy models for complex cells(Hubel and Wiel,1962;Pollen and Ronner,1983;Heeger,1992).It must be noted,however,that our model does not specify the particular basis of a given invariant-feature subspace.

3.5Learning independent feature subspaces

Learning the independent feature subspace reprentation can be simply achieved by gradient asce

nt of the log-likelihood in Eq.(5).Due to whitening,we can constrain the vectors w i to be orthogonal and of unit norm,as in ordinary ICA;the constraints usually speed up convergence.A stochastic gradient ascent of the log-likelihood can be obtained as

∆w i(x,y)∝I(x,y)<w i,I>g(∑

r∈S j(i)

<w r,I>2)(7)

where j(i)is the index of the subspace to which w i belongs,and g=p′/p is a nonlinear function that incorporates our information on the sparness of the norms of the projections.For example,if we choo the distribution in Eq.(6),

we have g(u)=−1

2αcan be ignored.After every step of(7),the vectors w i

need to be orthonormalized;for a variety of methods to perform this,e(Hyvärinen and Oja,1997;Karhunen et al., 1997).

The learning rule in(7)can be considered as’modulated’nonlinear Hebbian learning.If the subspace which contains w i were just one-dimensional,this learning rule would reduce to the learning rules for ordinary ICA given in(Hyvärinen and Oja,1998)and cloly related to tho in(Bell and Sejnowski,1997;Cardoso and Laheld,1996; Karhunen et al.,1997).The difference is that in the general ca,the Hebbian term is divided by a function of the

yy是什么

output of the complex cell,given by∑r∈S

j(i)<w r,I>2,if we assume the terminology of the energy models.In other

words,the Hebbian term is modulated by a top-down feedback signal.In addition to this modulation,the neurons interact in the form of the orthogonalizing feedback.

4Experiments

To test our model,we ud patches of natural images as input data I k(x,y),and estimated the model of independent feature subspace analysis.

4.1Data and methods

The data was obtained by taking16×16pixel image patches at random locations from monochrome photographs depicting wild-life scenes(animals,meadows,forests,etc.).The images were taken directly from PhotoCDs,and are available on the World Wide Web1.The mean gray-scale value of each image he DC component)was subtracted.The data was then low-passﬁltered by reducing the dimension of the data vector by principal component analysis,retaining the160principal components with the largest variances.Next,the data was whitened by the zero-pha whiteningﬁlter,which means multiplying the data by C−1/2,where C is the covariance of the data(after PCA), (Bell and Sejnowski,1997).The preprocessing steps are esntially similar to tho ud in(Olshaun and Field,1996;van Hateren and van der Schaaf,1998).The likelihood in Eq.(5)for50000such obrvations was maximized under the constraint of orthonormality of theﬁlters in the whitened space,using the averaged version of the learning rule in(7),i.e.,we ud the ordinary gradient of the likelihood instead of the stochastic gradient.The fact that the data was contained in a160dimensional subspace meant that the160basis vectors w i now formed an orthonormal system for that subspace and not for the original space,but this did not necessitate any changes in the learning rule.The density p was chon as in Eq.(6).The algorithm was initialized as in(Bell and Sejnowski,1997) by taking as w i the160middle columns of the identity matrix.We also tried random initial values for W:the yielded qualitatively identical results,but using a localizedﬁlter t as the init

ial value improves considerably the convergence of the method,especially avoiding some of theﬁlters getting stuck in local minima.This initialization lead,incidentally,to a weak topographical organization of theﬁlters.The computations took about10hours on a single RISC processor.Experiments were made with different dimensions S j for the subspaces:2,4,and8(in a single run,all the subspaces had the same dimension).The results shown below are for4-dimensional subspaces,but the results are similar for other dimensions.

4.2Results

twistysFig.2shows theﬁlter ts of the40feature subspaces(complex cells),when subspace dimension was chon to be 4.The results are shown in the zero-pha whitened space;note that due to orthogonality,theﬁlters are equal to the basis vectors.Theﬁlters look qualitatively similar in the original,not whitened space.The only difference is that in the original space,theﬁlters are concentrated on higher frequencies.

It can be en that the linearﬁlters associated with a single complex cell all have approximately the same orien-tation and frequency.Their locations are not identical,but clo to each other.The phas differ considerably.Every feature subspace can thus be considered a generalization of a quadrature-

phaﬁlter pair as found in the classical energy models(Pollen and Ronner,1983),enabling the cell to be lective to given orientation and frequency,but in-variant to pha and somewhat invariant to shifts.Using4ﬁlters instead of a pair greatly enhances the shift invariance of the feature subspace.In fact,when the subspace dimension was2,we obtained approximately quadrature-pha ﬁlter pairs.

To quantitatively demonstrate the properties of the model we compared the respons of a reprentative feature subspace and the associated linearﬁlters,for different stimulus conﬁgurations.First,an optimal stimulus for the feature subspace was computed in the t of Gaborﬁlters.One of the stimulus parameters was changed at a time to e how the respon changes,while the other parameters were held constant at the optimal values.Some typical simuli are depicted in Fig.3.The investigated parameters were pha,orientation,and location(shift).

Fig.4shows the results for one typical feature subspace.The4linearﬁlters spanning the feature subspace are shown in Fig.4a).The optimal stimulus values(for the feature subspace)are reprented by0in the he values given here are departures from the optimal values.The respons are in arbitrary units.For different phas, ranging from−π/2toπ/2we thus obtained Fig.4b).On the bottom row,we have the respon curve of the feature日耳曼语

英文艺术字

本文发布于:2023-07-04 23:58:43，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/1078828.html

上一篇：Spar and Redundant Reprentation Modeling — what next

下一篇：降维方法

标签：短文网络励志字典

留言与评论（共有 0 条评论）