2006Special issue
Exploratory analysis of climate data using source paration methods
Alexander Ilin a,*,Harri Valpola b,Erkki Oja a纯牛奶的好处
a Laboratory of Computer and Information Science,Helsinki University of Technology,P.O.Box5400,FI-02015TKK,Espoo,Finland
b Laboratory of Computational Engineering,Helsinki University of Technology,P.O.Box9203,FI-02015TKK,Espoo,Finland
Abstract
We prent an example of exploratory data analysis of climate measurements using a recently developed denoising source paration(DSS) framework.We analyzed a combined datat containing daily measurements of three variables:surface temperature,a level pressure and precipitation around the globe,for a period of56years.Components exhibiting slow temporal behavior were extracted using DSS with linear denoising.Thefirst component,most prominent in the interannual time scale,captured the well-known El Nin˜o-Southern Oscillation(ENSO) phenomenon and the cond component was clo
to the derivative of thefirst one.The slow components extracted in a wider frequency range were further rotated using a frequency-bad paration criterion implemented by DSS with nonlinear denoising.The rotated sources give a meaningful reprentation of the slow climate variability as a combination of trends,interannual oscillations,the annual cycle and slowly changing asonal variations.Again,components related to the ENSO phenomenon emerge very clearly among the found sources.
q2006Elvier Ltd.All rights rerved.
Keywords:Independent component analysis;Blind source paration;Semiblind methods;Denoising source paration;El Nin˜o;Global climate
1.Introduction
One of the main goals of statistical analysis of climate data is to extract physically meaningful patterns of climate variability from highly multivariate weather measurements. The classical technique for defining such dominant patterns is principal component analysis(PCA),or empirical orthogonal functions(EOF)as it is called in climatology(von Storch&Zwiers,1999).However,many rearchers have pointed out that the maximum remaining vari
ance criterion ud in PCA can lead to such problems as mixing different physical phenomena in one extracted component(Kim&Wu, 1999;Richman,1986).This makes PCA a uful tool for information compression but limits its ability to isolate individual modes of climate variation.
To overcome this problem,the rotation of the principal components has proven uful.The different rotation criteria reviewed by Richman(1986)are bad on the general‘simple structure’idea aimed at,for example,spatial or temporal localization of the rotated components.The rotation of EOFs can be either orthogonal or oblique,which potentially leads to better interpretability of the extracted components.
Independent component analysis(ICA)is a recently developed statistical technique for component extraction, which can also be ud for rotating principal components. The basic assumption made in ICA is the statistical independence of the extracted components,which may lead to a meaningful data reprentation in a number of applications (Hyva¨rinen,Karhunen,&Oja,2001,for introduction). ICA is bad on higher-order statistics and in this respect bears some similarity to classical rotation techniques such as the Varimax orthogonal rotation(Richman,1986). Several attempts to apply ICA in climate rearch have already been made(Aires,Che´din,&Nadal,2000;Lotsch,Friedl,& Pinzo´n,2003).
In this paper,we analyze weather measurements using a novel extension of ICA called denoising source paration (DSS)(Sa¨rela¨&Valpola,2005).DSS is a general paration framework which does not necessarily exploit the indepen-dence assumption but rather looks for hidden components which have‘interesting’properties.The interestingness of the properties is controlled by means of a temporalfiltering or denoising procedure.
社区服务中心In thefirst experiment,we show that the sources with the most prominent inter-annual oscillations can be identified using DSS with linearfiltering as denoising.The leading components are clearly related to the well-known El Nin˜o-Southern Oscillation(ENSO)phenomenon.
In the cond experiment,we u DSS with linear denoising as thefirst,preprocessing step of climate data analysis.A
wider
Neural Networks19(2006)155–167
/locate/neunet
0893-6080/$-e front matter q2006Elvier Ltd.All rights rerved.
doi:10.unet.2006.01.011
*Corresponding author.
E-mail address:alexander.ilin@tkk.fi(A.Ilin),harri.valpola@tkk.fi
(H.Valpola),erkki.oja@tkk.fi(E.Oja).
frequency band in the denoisingfilter is ud to identify the slow subspace of the climate system.The found slow components are further rotated using an iterative DSS procedure bad on nonlinear denoising.The rotation aims to find components with distinct power spectra.
The extracted components turn out to reprent the subspace of the slow climate phenomena as a li
near combination of trends,decadal-interannual oscillations,the annual cycle and other phenomena with distinct spectral contents.Using this approach,the known climate phenomena are identified as certain subspaces of the climate system and some other interesting phenomena hidden in the weather measurements are found.
The contents of this paper are as follows.In Section2,we explain the modeling assumptions of the source paration methods and prent a short introduction to DSS.Section3 describes the climate measurements ud in the experiments. Section4explains how DSS is tuned to extract components with the most prominent interannual oscillations and prent the experimental results(which were partly reported by Ilin, Valpola,&Oja,2005).In Section5,we give the description of the frequency-bad paration algorithm implemented in the DSS framework and report the climate phenomena found with this method.Preliminary results of this analysis were published in a conference paper by Ilin&Valpola(2005).Finally,we discuss the results and possible future directions in Section6.
2.Source paration methods
2.1.Blind source paration and independent component analysis
The basic modeling assumption of linear source paration methods is that there are some hidden c
omponent signals or time ries s i(t)(also called sources,factors or latent variables) which are linearly combined into the multivariate measure-ments x j(t):
x jðtÞZ
X N
i Z1
a ji s iðtÞ;j Z1;.;M:(1)
The index j runs over the measurement nsors(typically spatial locations),and discretized time t runs over the obrvation period:t Z1,.,T.This can be expresd in matrix formulation by denoting the matrix of obrvations by X, where the nsor index j denotes the rows and the time index t denotes the columns.The matrix of sources,S is defined likewi,and the coefficients a ji of the linear combinations make up a matrix A.Using the matrices,(1)becomes
X Z AS:(2) If we denote the columns of matrix A by a i and the columns of matrix X by x(t),then(2)can be further written as
xðtÞZ
X N
i Z1a i s iðtÞ:(3)
The mapping A is called the mixing matrix in the ICA
terminology or the loading matrix in the context of PCA.In
climate data analysis,the time ries s i(t)usually correspond to
the time-varying states of the climate system,and the loading
vectors a i are the spatial maps showing the typical weather
patterns corresponding to the components.
The goal of the analysis is to estimate the unknown
components s i(t)and the corresponding loading vectors a i
from the obrved data X.With minimum a priori assumptions
about the sources,the problem is called blind source paration
(BSS).
Independent component analysis(ICA)is a popular method
of solving the BSS problem.In ICA,the only assumption is the
statistical independence of the sources:each s i(t)is regarded as
a sample from a random variable s i,and the variables are
mutually independent.There are a large variety of algorithms
for solving the mixing matrix A and the sources(Cichocki&
Amari,2002;Hyva¨rinen,Karhunen,et al.,2001).One of the
most popular methods is the FastICA algorithm.The
independence of the sources is measured by their mutual
information,which results in a minimum entropy criterion.In
practice the paration is achieved by rotating the obrvations
into directions that are as non-Gaussian as possible(Hyva¨rinen,
Karhunen,et al.,2001).
2.2.Denoising source paration
ICA is a powerful tool for exploratory data analysis,when
very little is known about the underlying source process s i(t).
Independence is the only assumption.Sometimes however,
such prior information exists,such as the general shape of the
time curves or their frequency contents.For example,in the
climate data we might be interested in some phenomena that
would be cyclic over a certain period,or exhibit slow changes.
It would be very uful if such prior knowledge could be
incorporated into the paration algorithm directly.Exploiting
prior knowledge about the sources may significantly help in
finding a good reprentation of the data,and fully blind
algorithms are not the best choice.
This kind of problem tting,with some prior knowledge
available,is called miblind.One of the methods for solving it
is a recently introduced method called denoising source
paration(Sa¨rela¨&Valpola,2005).
DSS is a general algorithmic framework,which can identify
the model in Eq.(1)exploiting prior knowledge about its
unknowns.In DSS,the independence criterion of ICA is
replaced by the assumption that the sources should(1)be
uncorrelated and(2)maximize some desired
non-Gaussianity,slowness,etc).In this respect,DSS can be
en as an extension of ICA without the strict independence
assumption.
Thefirst requirement is assured in DSS by using a
preprocessing step called whitening or sphering.The goal of
whitening is to make the covariance structure of the data
uniform in such a way that any linear projection of the data has
unit variance.The positive effect of such a transformation is
that any orthogonal basis in the whitened space defines
A.Ilin et al./Neural Networks19(2006)155–167
世界上最美的离别156
uncorrelated sources.Therefore,whitening is ud as a preprocessing step in many ICA algorithms,which allows restricting the mixing matrix to be orthogonal afterwards.Whitening is usually implemented by PCA.Assuming that the measurements x j (t )have been normalized to zero mean,the matrix of sphered data Y is calculated as Y Z D K 1=2V T X ;
(4)
where D is the diagonal matrix of eigenvalues of the data covariance matrix,defined as (1/T )XX T .The columns of matrix V are the corresponding eigenvectors.The dimension-ality of the data can also be reduced at this stage by retaining only the principal components corresponding to the largest eigenvalues in D .It is easy to show that it now holds that (1/T )YY T Z I .Matrix Y is not unique,though;any orthogonal rotation of its columns produces a matrix that also has unit covariance.
The rotational ambiguity of the whitened data matrix is fixed using the cond DSS requirement,which implements the source paration criterion.This requirement is usually introduced in the algorithm in the form of a denoising function .The purpo of denoising is to emphasize the desired properties in the current source estimates,which assures gradual maximization of the properties.
2.2.1.Linear denoising
In the simplest ca,the denoising function can be implemented by a linear temporal filter,operating on the rows of matrix Y and giving another matrix f (Y )Z YF with F the filtering matrix.Denoising renders the variances of the sphered components different:the covariance matrix of f (Y )equals (1/T )YFF T Y T which is no more equal to the unit matrix.Now PCA can identify the directions which maximize the properties of interest.The eigenvalues obtained from PCA give the ratio of the variances of the sources after and before filtering which is the objective function of DSS with linear denoising.The components are ranked according to the prominence of the desired properties the same way as the principal components in PCA are ranked according to the amount of variance they explain.The paration thus consists of three steps:whitening,linear denoising (filtering)and PCA on the denoid data,as shown in Fig.1.
The DSS algorithms implemented by linear denoising optimize the same type of cost function as the maximum noi fraction (MNF)transform propod by Green,Berman,Switzer,and Craig (1988).However,in DSS framework,the computations are structured such that it is easier to generalize the method for nonlinear denoising.
2.2.2.Nonlinear denoising
More complex paration criteria usually require nonlinear
denoising (e Sa
¨rela ¨&Valpola,2005;Valpola &Sa ¨rela ¨,2004,for veral examples).Then,DSS requires an algorithm
prented in Fig.2.Here,whitening is followed by an iterative procedure with three successive steps:
(1)Source estimation using the current estimate of the
demixing matrix W :
S Z WY ;
(2)Applying the denoising function to the source estimates:
^S
Z f ðS Þ;(3)Reestimation of the demixing matrix
W T Z orth ðY ^S
T Þ:The iterations continue until the source estimates do not change.In Step 3,orth(.)is an operator giving the orthogonal
projection of the matrix Y ^S
T onto the t of orthogonal matrices.Without denoising,this procedure is equivalent to the power
method for computing the principal components of Y ,becau then Steps 1and 3give W T Z orth(YY T W T ).Since Y is white,all the eigenvalues are equal and the solution without denoising becomes degenerate.Therefore,even slightest changes made by denoising can determine the DSS rotation.Since the denoising procedure emphasizes the desired properties of the sources,DSS can find the rotation where the properties of interest are maximized.
In ca of nonlinear denoising,the DSS objective function is usually expresd implicitly in the denoising function.There-fore,ranking the components according to the prominence of the desired properties is more difficult and depends on the exact paration criterion ud in the denoising procedure.
In the applications,we are interested not only in the sources (rows of matrix S ),but also in the matrix A in Eq.(2).The i th column of A is a spatial map showing how the effect of the i th source is distributed over the nsor array.Noting that in DSS it holds S Z WY ,we obtain from Eqs.(2)and (4)X Z AS Z AWY Z AWD K 1=2V T X :
(5)
Thus A should be chon as the (pudo)inver of WD K 1/2V T which is A Z VD 1=2W T :
(6)
Since the extracted components s i (t )are normalized to unit variances,the columns of A have a meaningful
scale.
Fig.1.The steps of the DSS algorithm in ca of linear
denoising.
Fig.2.The steps of the DSS algorithm in the general ca of nonlinear denoising.
A.Ilin et al./Neural Networks 19(2006)155–167157
Note that the signs of the extracted components cannot generally be determined by DSS,which is a well-known property of the classical ICA problem.DSS with nonlinear denoising can determine the sign if the signal is not symmetric. More details about the DSS method,including rigorous derivations and analysis were reported by Sa¨rela¨&Valpola (2005).
2.3.Extracting sources of climate variability
2.3.1.ICA in climate data analysis
Climate is a very complex system where different phenomena constantly interact with each other.For example, the annual cycle naturally affects other climate process,the El Nin˜o effect has great impact on global weather but also tends to pha-lock with the annual cycle,and so on.Therefore,the existence of any truly independent climate phenomena is an implausible assumption.
ICA can still be a uful tool for climate data reprentation providing,for example,temporally or spatially localized components.Several rearchers have shown that ICA can extract meaningful components from climate and weather data (Aires et al.,2000;Basak,Sudarshan,Trivedi,&Santhanam, 2004;Lotsch et al.,2003).However,due to a great amount of noi in climate data,naive ICA can often produce overfitted solutions(e Hyva¨rinen,Sa¨rela¨,&Viga´rio,1999;Sa¨rela¨& Viga´rio,2003,for discussion of this problem)and one would require a very long obrvation period in order tofind a meaningful ICA solution.
2.3.2.Tuning DSS for climate data analysis
DSS is a much moreflexible tool as one can choo the paration criterion that gives the most meaningful or interpretable reprentation of the data.In this work,we arch for physically meaningf
ul states of the climate system which would posss slow behavior.The slowness of the components assures their larger long-term effect on global weather and possibly facilitates making predictions of their future develop-ment.By physically meaningful states,we mean such components who dynamics are as weakly coupled as possible. Climate phenomena with distinct time scales of their variability (e.g.the annual cycle,El Nin˜o-Southern Oscillation or slow climate trends)would intuitively be such components.
First,we concentrate on eking components which exhibit prominent variability in the slow timescale.In Section4,we show how DSS can be tuned to extract such components bad on a criterion that we term clarity.This approach can be uful for identifying the subspace of slow phenomena and sometimes it can alsofind a meaningful reprentation within the found subspace.However,it does not generally provide a good paration criterion,which may lead to mixtures of different climate phenomena still existing in any one component. Therefore,we propo a more complex DSS-bad algorithm which tries to parate slow climate components bad on their frequency contents.The exposition of this algorithm is done in Section5.1.3.Data and preprocessing method
We apply the propod DSS tools to measurements of three major atmospheric variables:surface te
mperature,a level pressure and precipitation.This t of variables is often ud for describing global climate phenomena such as ENSO (Trenberth&Caron,2000).The datats are provided by the reanalysis project of the National Centers for Environmental Prediction—National Center for Atmospheric Rearch (NCEP/NCAR)(Kalnay et al.,1996;NCEP data,2004).
The data reprent globally gridded measurements over a long period of time.The spatial grid is regularly spaced over the globe with2.58!2.58resolution.Although the quality of the data is wor for the beginning of the reanalysis period and it considerably varies throughout the globe,we ud the whole period of1948–2004.Thus,the data is very high-dimensional: more than10,000spatial locations by more than20,000time instances for each of the three data ts.
The main drawback of the reanalysis data is that it is not fully real.The measurements missing in some spatial locations or time instances have been reestimated bad on the available data and approximation models.Yet,the data is as clo to the real measurements as possible and its regularity makes this data particularly suitable for the source paration methods applied in this work.
To preprocess the data,the long-term mean was removed and the data points were weighted to dimi
nish the effect of a denr sampling grid around the poles:each data point was multiplied by a weight proportional to the square root of the corresponding area of its location.This produced the original data matrix X.The spatial dimensionality of the data was then reduced using the PCA/EOF analysis applied to the weighted data.For each datat,we retained100principal components. This means that in Eq.(4),the columns of Y have dimension 100,while tho of the original X are over10,000dimensional. Yet the principal components explain more than90%of the total variance,which is due to the high spatial correlation between nearby points on the global grid.The DSS-bad analysis was then applied to the combined data containing the measurements of the three variables.
4.ENSO as component with most prominent interannual oscillations
Components with prominent slow behavior can be extracted from the data using DSS with low-pass or bandpassfiltering as 103211/21/3
0.5
1
Fig.3.The frequency respon of thefilter ud in DSS with linear denoising forfinding components with the most prominent interannual oscillations.The abscissa is linear in frequency but is labeled in terms of periods,in years.
A.Ilin et al./Neural Networks19(2006)155–167 158
联销体
Fig.4.Four components with the most prominent interannual oscillations extracted from the combined data.Top four:The time cour of the leading components
found by DSS (black)and their filtered versions (gray).The non-filtered components are normalized to unit variance.Bottom two:The Nin
˜o 3SST index (Nin ˜o 3SST,2004)and its derivative bear similarities to the first and cond components,
respectively.
Fig.5.Top four:The spatial patterns of the four leading interannual components extracted from the combined data.Bottom:The regression coefficients calculated
from the combined data using the Nin
˜o 3SST index.The maps are weighted by the square root of the clarity values of the components.A.Ilin et al./Neural Networks 19(2006)155–167159
denoising.This type of denoising is linear and therefore the simple algorithm described in Section 2.2.1is applicable here.The extracted slow components are ranked according to their clarity,which is defined as the ratio of the variance of the component filtered in the desired frequency range and the variance of the non-filtered component:clarity Z
var f s f g
var f s g
:That is,the first component contains the least relative amount of other frequencies in its power spect
rum.The prented algorithm is functionally identical to MNF (Green et al.,1988)with the noi defined as the spectral components lying outside the desired frequency range.
中央控制室In this ction,we aim at finding components which exhibit prominent variability in the interannual timescale.Therefore,the bandpass filter who frequency respon is shown in Fig.3is ud here.
The four most prominent interannual components extracted from the combined data are shown in Fig.4.The time cour of the first component (upper curve in Fig.4)shows striking
remblance with the El Nin
˜o index calculated from the a surface temperature (SST)in the Nin
什么球不能踢˜o 3region (lower curve).The correlation coefficients between the extracted component
and the Nin
˜o 3SST index is 0.9323.Note that the upper components are extracted from climate data consisting of daily measurements from the whole globe,with the only constraint being the emphasis on strong interannual oscillations.Also
浪漫七夕
note that the values of the Nin
˜o 3SST index are monthly averages and conquently appear smoother than the daily averages in the DSS components.
The spatial patterns corresponding to the four leading components are shown 1in Fig.5.The first surface temperature map contains many features traditionally associated with El Nin
˜o (Trenberth &Caron,2000):the strongest pattern in central and eastern tropical Pacific with broader regions along the eastern Pacific coast,a negatively correlated ‘boomerang’-shaped region at 20–408latitude in both hemispheres linked in the far western equatorial Pacific,positive values in the Indian Ocean,and negative values in the North Pacific and around New Zealand.
The corresponding a level pressure map is similar to the classical Southern Oscillation pattern (Trenberth &Caron,2000):there is a major esaw structure in the tropics and subtropics,large pressure departures in the North Pacific,and a quadrupole-like structure in the Australasia–South Pacific region.The precipitation map also contains many features associated with the ENSO phenomenon:the dominant effects are clearly en throughout the tropical Pacific with maximum
values in the Nin
˜o 3region.The clear patterns here are the intertropical convergence zone (ITCZ)and South Pacific convergence zone (SPZC),a ‘boomerang’-shaped negatively correlated area in mid-latitudal Pacific merged over Indonesia,
positive values in the Indian Ocean and subtropical and tropical Atlantic.
Similar ENSO features are obrved from the regression patterns (shown at the bottom of Fig.5)calculated using the Nin
˜o 3SST index.Note that there are some differences in the extracted maps compared to the regression patterns,for example,a stronger teleconnection pattern in southern Africa in surface temperature and a stronger positive center in the South Pacific in a level pressure.
The cond extracted component also appears to be related to ENSO and roughly corresponds to the time derivative of the first component (e Fig.4).The corresponding precipitation
pattern has an interesting localization in the Nin十香菜
˜o 3region and mostly negative loadings in the rest of the tropical and subtropical areas.The third and the fourth components show weaker oscillations in the interannual time scale.Similar component
s will be discusd in Section 5.1.Table 1lists the clarity values of the found components.Since this index is ud as the objective function for extracting the components,the first component always has the largest value in all conditions.We also applied the same analysis to the three data ts parately (e Ilin et al.,2005)and the first extracted component was always a good ENSO index.Somewhat surprisingly even the component extracted from a-level-pressure data rembled more the Nin
˜o 3SST index than Southern Oscillation Index (SOI)although SOI is defined in terms of a level pressure.
5.Extracting slow components with distinct power spectra 5.1.Frequency-bad paration of sources
The algorithm described in the previous ction is uful for extracting components which are dominant in a certain frequency range.This requires some knowledge about the expected power spectrum of the extracted component in order to u a proper frequency mask in the denoising filter.In blinder ttings,however,this information does not exist and the frequency masks such as the one prented in Fig.3should be estimated automatically.
In this ction,we prent an algorithm which can be en as an extension of the previous approach.It assumes that different extracted components are dominant in different frequencies (hence,they have distinct power spectra)and automatically estimates an individual frequency mask for each component.The adaptation of the masks is practically implemented using a competition mechanism involving the smoothed power spectra
Table 1
Clarity values of DSS components and regression components obtained from
ENSO indices Nin ˜o 3SST (2004)and SOI (2005)Component 1
0.7484Component 20.5105Component 30.3376Component 40.3014Nin ˜o 3SST 0.6853SOI
0.5354
1
The maps are plotted using the mapping toolbox developed by Pawlowicz (2000).
A.Ilin et al./Neural Networks 19(2006)155–167
160