Emergence of pha and shift invariant features by decomposition of natural images into independent feature subspaces
Aapo Hyvärinen and Patrik Hoyer
Helsinki University of Technology
Laboratory of Computer and Information Science
P.O.Box5400,FIN-02015HUT,Finland
aapo.hyvarinen@hut.fi,patrik.hoyer@hut.fi
Neural Computation12(7):1705-1720[July,2000]
Abstract
Olshaun and Field(1996)applied the principle of independence maximization by spar coding to extract features from natural images.This leads to the emergence of oriented linearfilters that have simultaneous localiza-
tion in space and in frequency,thus rembling Gabor functions and simple cell receptivefields.In this paper,we
show that the same principle of independence maximization can explain the emergence of pha and shift invariant
features,similar to tho found in complex cells.This new kind of emergence is obtained by maximizing the in-
dependence between norms of projections on linear subspaces(instead of the independence of simple linearfilter
outputs).The norms of the projections on such’independent feature subspaces’then indicate the values of invariant
features.
1Introduction
A fundamental approach in signal processing is to design a statistical generative model of the obrv
ed signals.Such an approach is also uful for modeling the properties of neurons in primary nsory areas.Modeling visual data by a simple linear generative model,Olshaun and Field(1996)showed that the principle of maximizing the sparness (or nongaussianity)of the underlying image components is enough to explain the emergence of Gabor-likefilters that remble the receptivefields of simple cells in mammalian primary visual cortex(V1).Maximizing sparness is in this context equivalent to maximizing the independence of the image components(Comon,1994;Bell and Sejnowski,1997;Olshaun and Field,1996).We show in this paper that this same principle can also explain the emergence of pha and shift invariant he principal properties of complex cells in V1.Using the method of feature subspaces(Kohonen,1995;Kohonen,1996),we model the respon of a complex cell as the norm of the projection of the input vector(image patch)onto a linear subspace,which is equivalent to the classical energy models.Then we maximize the independence between the norms of such projections,or energies.Thus we obtain features that are localized in space,oriented,and bandpass(lective to scale/frequency),like tho given by simple cells,or Gabor analysis.In contrast to simple linearfilters,however,the obtained features also show emergence of pha invariance and(limited)shift or translation invariance.Pha invariance means that the respon does not depend on the Fourier-pha of the stimulus:the respon is the same for a white bar and a black ba
r,as well as for a bar and an edge.Limited shift invariance means that a near-maximum respon can be elicited by identical
1
bars or edges at slightly different locations.The two latter properties cloly parallel the properties that distinguish complex cells from simple cells in V1.Maximizing the independence,or equivalently,the sparness of the norms of the projections to feature subspaces thus allows for the emergence of exactly tho invariances that are encountered in complex cells,indicating their fundamental importance in image data.
2Independent component analysis of image data
The basic models that we consider here express a static monochrome image I(x,y)as a linear superposition of some
features or basis functions b i(x,y):
I(x,y)=
m
∑
i=1
b i(x,y)s i(1)
杨桃的英文where the s i are stochastic coefficients,different for each image I(x,y).The crucial assumption here is that the s i are nongaussian,and mutually independent.This type of decomposition is called independent component analysis (ICA)(Comon,1994;Bell and Sejnowski,1997;Hyvärinen and Oja,1997),or from an alternative viewpoint,spar coding(Olshaun and Field,1996;Olshaun and Field,1997).
Estimation of the model in Eq.(1)consists of determining the values of s i and b i(x,y)for all i and(x,y),given a sufficient number of obrvations of images,or in practice,image patches I(x,y).We restrict ourlves here to the basic ca where the b i(x,y)form an invertible linear system.Then we can invert the system as
s i=<w i,I>(2) where the w i denote the inverfilters,and<w i,I>=∑x,y w i(x,y)I(x,y)denotes the dot-product.The w i(x,y)can then be identified as the receptivefields of the model simple cells,and the s i
are their activities when prented with a given image patch I(x,y).Olshaun and Field(1996)showed that when this model is estimated with input data consisting of patches of natural scenes,the obtainedfilters w i(x,y)have the three principal properties of simple cells in V1:they are localized,oriented,and bandpass.Van Hateren and van der Schaaf(1998)compared quantitatively the obtainedfilters w i(x,y)with tho measured by single-cell recordings of the macaque cortex,and found a good match for most of the parameters.
3Decomposition into independent feature subspaces
3.1Introduction
In addition to the esntially linear simple cells,another important class of cells in V1is complex cells.Complex cells share the above-mentioned properties of simple cells but have the two principal distinguishing properties of pha invariance and(limited)shift invariance(Hubel and Wiel,1962;Pollen and Ronner,1983),at least for the preferred orientation and frequency.Note that although invariance with respect to shift and global Fourier pha are equivalent, they are different properties when pha is computed from a local Fourier transform.Another distinguishing property of complex cells is that the receptivefields are larger than in simple cells,but this difference is only quan
titative,and of less conquence here.For more details (Heeger,1992;Pollen and Ronner,1983;Mel et al.,1998). To this date,very few attempts have been made to formulate a statistical model that would explain the emergence of the properties of visual complex cells.It is simple to e why ICA as in Eq.(1)cannot be directly ud for modeling complex cells.This is due to the fact that in that model the activations of the neurons s i can be ud to linearly reconstruct the image I(x,y),which is not true for complex cells due to their two principal properties of
2
pha invariance and shift invariance:The respons of complex cells do not give the pha or the exact position of the stimulus,at least not as a linear function as in Eq.(1).(See(von der Malsburg et al.,1998)for a nonlinear reconstruction of the image from complex cell respons.)
The purpo of this paper is to explain the emergence of pha and shift invariant features using a modification of the ICA model.The modification is bad on combining the technique of multidimensional independent component analysis(Cardoso,1998)and the principle of invariant-feature subspaces(Kohonen,1995;Kohonen,1996).Wefirst describe the two recently developed techniques.
3.2Invariant feature subspaces
The classical approach for feature extraction is to u linear transformations,orfilters.The prence of a given feature is detected by computing the dot-product of input data with a given feature vector.For example,wavelet, Gabor,and Fourier transforms,as well as most models of V1simple cells,u such linear features.The problem with linear features is,however,that they necessarily lack any invariance with respect to such transformations as spatial shift or change in(local)Fourier pha(Pollen and Ronner,1983;Kohonen,1996).
Kohonen(1996)developed the principle of invariant-feature subspaces as an abstract approach to reprenting features with some invariances.The principle of invariant-feature subspaces states that one may consider an invariant feature as a linear subspace in a feature space.The value of the invariant,higher-order feature is given by(the square of)the norm of the projection of the given data point on that subspace,which is typically spanned by lower-order features.
A feature subspace,as any linear subspace,can always be reprented by a t of orthogonal basis vectors,say w i(x,y),i=1,...,n,where n is the dimension of the subspace.Then the value F(I)of the feature F with input vector
I(x,y)is given by
F(I)=
n
∑
i=1
<w i,I>2(3)
(For simplicity of notation and terminology,we do not distinguish clearly the norm and the square of the norm in this paper).In fact,this is equivalent to computing the distance between the input vector I(x,y)and a general linear combination of the basis vectors(filters)w i(x,y)of the feature subspace(Kohonen,1996).A graphical depiction of feature subspaces is given in Fig.1.
In(Kohonen,1996),it was shown that this principle,when combined with competitive learning techniques,can lead to emergence of invariant image features.
item是什么意思3.3Multidimensional independent component analysis
In multidimensional independent component analysis(Cardoso,1998),a linear generative model as in Eq.(1)is assumed.In contrast to ordinary ICA,however,the components(respons)s i are not assumed to be all mutually independent.Instead,it is assumed that the s i can be divided into couples,triplets or in general n-tuples,such that the s i inside a given n-tuple may be dependent on each other,but dependencies between different n-tuples are not allowed.
Every n-tuple of s i corresponds to n basis vectors b i(x,y).We call a subspace spanned by a t of n such basis vectors an’independent(feature)subspace’.In general,the dimensionality of each independent subspace need not be equal,but we assume so for simplicity.
The model can be simplified by two additional assumptions.First,even though the components s i are not all in-dependent,we can always define them so that they are uncorrelated,and of unit variance.In fact,linear dependencies inside a given n-tuple of dependent components could always be removed by a linear transformation.Second,we can
3
assume that the data is whitened(sphered);this can be always accomplished PCA(Comon,1994).Whitening is a conventional preprocessing step in ordinary ICA,where it mak
英语励志短文es the basis vectors b i orthogonal(Comon,1994; Hyvärinen and Oja,1997),if we ignore anyfinite-sample effects.
The two assumptions imply that the b i are orthonormal,and that we can take b i=w i as in ordinary ICA with whitened data.In particular,the independent subspaces become orthogonal after whitening.The facts follow directly from the proof in(Comon,1994),which applies here as well,due to our above assumptions.
Let us denote by J the number of independent feature subspaces,and by S j,j=1,...,J the t of the indices of the s i belonging to the subspace of index j.Assume that the data consists of K obrved image patches I k(x,y),k= 1,...,K.Then we can express the likelihood L of the data given the model as follows
L(I k(x,y),K;w i(x,y),)
=
K
∏
k=1
哈利波特7在线阅读
[|det W|
J
∏
j=1
p j(<w i,I k>,i∈S j)](4)
where p j(.),which is a function of the n arguments<w i,I k>,i∈S j,gives the probability density inside the j-th n-tuple of s i,and W is a matrix containing thefilters w i(x,y)as its columns.The term|det W|appears here as in any expression of the probability density of a transformation,giving the change in volume produced by the linear transformation,(Pham et al.,1992).
The n-dimensional probability density p j(.)is not specified in advance in the general definition of multidimen-sional ICA(Cardoso,1998).
3.4Combining invariant feature subspaces and independent subspaces
Invariant-feature subspaces can be embedded in multidimensional independent component analysis by considering probability distributions for the n-tuples of s i that are spherically depend only on the norm.In other words,the probability density p j(.)of the n-tuple with index j∈{1,...,J},can be expresd as a function of the sum of the squares of the s i,i∈S j only.For simplicity,we assume further that the p j(.)are equal for all for all subspaces.
网络字典This means that the logarithm of the likelihood L of the he K obrved image patches I k(x,y),k= 1,...,K,given the model,can be expresd as
log L(I k(x,y),K;w i(x,y),)
=
K
∑
k=1
J
∑
j=1
log p(∑
i∈S j
<w i,I k>2)+K log|det W|(5)
where p(∑i∈S
j s2i)=p j(s i,i∈S j)gives the probability density inside the j-th n-tuple of s i.
Recall that prewhitening allows us to consider the w i(x,y)to be orthonormal,which implies that log|det W| is zero.This shows that the likelihood in Eq.(5)is a function of the norms of the projections of I k(x,y)on the subspaces indexed by j,which are spanned by the orthonormal basis ts given by w i(x,y),i∈S j.Since the norm of the projection of visual data on practically any subspace has a supergaussian distribution,we need to choo the probability density p in the model to be spar(Olshaun and Field,1996),i.e.supergaussian(Hyvärinen and Oja, 1997).For example,we could u the following probability distribution:
log p(∑
i∈S j s2i)=−α[∑
i∈S jbozo
s2i]1/2+β,(6) 4
which could be considered a multi-dimensional version of the exponential distibution(Field,1994).The scaling constantαand the normalization constantβare determined so as to give a probability density that is compatible with the constraint of unit variance of the s i,but they are irrelevant in the following.Thus we e that the estimation of the model consists offinding subspaces such that the norms of the projections of the(whitened)data on tho subspaces have maximally spar distributions.
The introduced’independent feature subspace analysis’is a natural generalization of ordinary ICA.In fact,if the projections on the subspaces are reduced to projections on1-D subspaces,the model reduces to ordinary ICA,provided that,in addition,the independent components are assumed to have non-skewed distributions. It is to be expected that the norms of the projections
on the subspaces reprent some higher-order,invariant features. The exact nature of the invariances has not been specified in the model but will emerge from the input data,using only the prior information on their independence.
When independent feature subspace analysis is applied to natural image data,we can identify the norms of the
projections(∑i∈S
j s2i)1/2as the respons of the complex cells.If the individualfilter vectors w i(x,y)are identified
with the receptivefields of simple cells,this can be interpreted as a hierarchical model where the complex cell respon is computed from simple cell respons s i,in a manner similar to the classical energy models for complex cells(Hubel and Wiel,1962;Pollen and Ronner,1983;Heeger,1992).It must be noted,however,that our model does not specify the particular basis of a given invariant-feature subspace.
3.5Learning independent feature subspaces
Learning the independent feature subspace reprentation can be simply achieved by gradient asce
nt of the log-likelihood in Eq.(5).Due to whitening,we can constrain the vectors w i to be orthogonal and of unit norm,as in ordinary ICA;the constraints usually speed up convergence.A stochastic gradient ascent of the log-likelihood can be obtained as
∆w i(x,y)∝I(x,y)<w i,I>g(∑
r∈S j(i)
<w r,I>2)(7)
where j(i)is the index of the subspace to which w i belongs,and g=p′/p is a nonlinear function that incorporates our information on the sparness of the norms of the projections.For example,if we choo the distribution in Eq.(6),
we have g(u)=−1
2αcan be ignored.After every step of(7),the vectors w i
need to be orthonormalized;for a variety of methods to perform this,e(Hyvärinen and Oja,1997;Karhunen et al., 1997).
The learning rule in(7)can be considered as’modulated’nonlinear Hebbian learning.If the subspace which contains w i were just one-dimensional,this learning rule would reduce to the learning rules for ordinary ICA given in(Hyvärinen and Oja,1998)and cloly related to tho in(Bell and Sejnowski,1997;Cardoso and Laheld,1996; Karhunen et al.,1997).The difference is that in the general ca,the Hebbian term is divided by a function of the
yy是什么
output of the complex cell,given by∑r∈S
j(i)<w r,I>2,if we assume the terminology of the energy models.In other
words,the Hebbian term is modulated by a top-down feedback signal.In addition to this modulation,the neurons interact in the form of the orthogonalizing feedback.
4Experiments
To test our model,we ud patches of natural images as input data I k(x,y),and estimated the model of independent feature subspace analysis.
5
4.1Data and methods
The data was obtained by taking16×16pixel image patches at random locations from monochrome photographs depicting wild-life scenes(animals,meadows,forests,etc.).The images were taken directly from PhotoCDs,and are available on the World Wide Web1.The mean gray-scale value of each image he DC component)was subtracted.The data was then low-passfiltered by reducing the dimension of the data vector by principal component analysis,retaining the160principal components with the largest variances.Next,the data was whitened by the zero-pha whiteningfilter,which means multiplying the data by C−1/2,where C is the covariance of the data(after PCA), (Bell and Sejnowski,1997).The preprocessing steps are esntially similar to tho ud in(Olshaun and Field,1996;van Hateren and van der Schaaf,1998).The likelihood in Eq.(5)for50000such obrvations was maximized under the constraint of orthonormality of thefilters in the whitened space,using the averaged version of the learning rule in(7),i.e.,we ud the ordinary gradient of the likelihood instead of the stochastic gradient.The fact that the data was contained in a160dimensional subspace meant that the160basis vectors w i now formed an orthonormal system for that subspace and not for the original space,but this did not necessitate any changes in the learning rule.The density p was chon as in Eq.(6).The algorithm was initialized as in(Bell and Sejnowski,1997) by taking as w i the160middle columns of the identity matrix.We also tried random initial values for W:the yielded qualitatively identical results,but using a localizedfilter t as the init
ial value improves considerably the convergence of the method,especially avoiding some of thefilters getting stuck in local minima.This initialization lead,incidentally,to a weak topographical organization of thefilters.The computations took about10hours on a single RISC processor.Experiments were made with different dimensions S j for the subspaces:2,4,and8(in a single run,all the subspaces had the same dimension).The results shown below are for4-dimensional subspaces,but the results are similar for other dimensions.
4.2Results
twistysFig.2shows thefilter ts of the40feature subspaces(complex cells),when subspace dimension was chon to be 4.The results are shown in the zero-pha whitened space;note that due to orthogonality,thefilters are equal to the basis vectors.Thefilters look qualitatively similar in the original,not whitened space.The only difference is that in the original space,thefilters are concentrated on higher frequencies.
It can be en that the linearfilters associated with a single complex cell all have approximately the same orien-tation and frequency.Their locations are not identical,but clo to each other.The phas differ considerably.Every feature subspace can thus be considered a generalization of a quadrature-
phafilter pair as found in the classical energy models(Pollen and Ronner,1983),enabling the cell to be lective to given orientation and frequency,but in-variant to pha and somewhat invariant to shifts.Using4filters instead of a pair greatly enhances the shift invariance of the feature subspace.In fact,when the subspace dimension was2,we obtained approximately quadrature-pha filter pairs.
To quantitatively demonstrate the properties of the model we compared the respons of a reprentative feature subspace and the associated linearfilters,for different stimulus configurations.First,an optimal stimulus for the feature subspace was computed in the t of Gaborfilters.One of the stimulus parameters was changed at a time to e how the respon changes,while the other parameters were held constant at the optimal values.Some typical simuli are depicted in Fig.3.The investigated parameters were pha,orientation,and location(shift).
Fig.4shows the results for one typical feature subspace.The4linearfilters spanning the feature subspace are shown in Fig.4a).The optimal stimulus values(for the feature subspace)are reprented by0in the he values given here are departures from the optimal values.The respons are in arbitrary units.For different phas, ranging from−π/2toπ/2we thus obtained Fig.4b).On the bottom row,we have the respon curve of the feature日耳曼语
英文艺术字