Unsupervid Feature Selection for Multi-Cluster Data

更新时间:2023-06-17 09:31:08 阅读: 评论:0

Unsupervid Feature Selection for Multi-Cluster Data Deng Cai Chiyuan Zhang Xiaofei He
State Key Lab of CAD&CG,College of Computer Science
Zhejiang University,China
{dengcai,xiaofeihe}@cad.zju.edu,
ABSTRACT
In many data analysis tasks,one is often confronted with very high dimensional data.Feature lection techniques are designed tofind the relevant feature subt of the original features which can facilitate clustering,classification and re-trieval.In this paper,we consider the feature lection prob-lem in unsupervid learning scenario,which is particularly difficult due to the abnce of class labels that would guide the arch for relevant information.The feature lection problem is esntially a combinatorial optimization problem which is computationally expensive.Traditional unsuper-vid feature lection methods address this issue by lect-ing the top ranked features bad on certain scores com-puted independently for each feature.The approaches ne-glect the possible correlation between different features and thus can not produce an optimal feature subt.I
nspired from the recent developments on manifold learning and L1-regularized models for subt lection,we propo in this paper a new approach,called Multi-Cluster Feature Selection (MCFS),for unsupervid feature lection.Specifically,we lect tho features such that the multi-cluster structure of the data can be best prerved.The corresponding op-timization problem can be efficiently solved since it only involves a spar eigen-problem and a L1-regularized least squares problem.Extensive experimental results over vari-ous real-life data ts have demonstrated the superiority of the propod algorithm.
Categories and Subject Descriptors
I.5.2[Pattern Recognition]:Design Methodology—Fea-ture evaluation and lection
General Terms
优秀毕业论文
Algorithms,Theory
Keywords
Feature lection,Unsupervid,Clustering
Permission to make digital or hard copies of all or part of this work for personal or classroom u is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwi,to republish,to post on rvers or to redistribute to lists,requires prior specific permission and/or a fee.
KDD’10,July25–28,2010,Washington,DC,USA.
Copyright2010ACM978-1-4503-0055-1/$10.00.1.INTRODUCTION生巧克力
In many applications in computer vision,pattern recog-nition and data mining,one is often confronted with very high dimensional data.High dimensionality significantly in-creas the time and space requirements for processing the data.Moreover,various data mining and machine learning tasks,such as classification and clustering,that are ana-lytically or computationally manageable in low dimensional spaces may become completely intractable in spaces of v-eral hundred or thousand dimensions[12].To overcome this problem,feature lection techniques[3,4,17,21,29,30]are designed to reduce the dimensionality byfinding a relevant feature subt.Once a small number of relevant features are lected,conventional data analysis techniques can then be applied.
Bad on whether the label information is available,fea-ture lection methods can be classified into
supervid and unsupervid methods.Supervid feature lection meth-ods usually evaluate the importance of features by the cor-relation between features and class label.The typical super-vid feature lection methods include Pearson correlation coefficients[23],Fisher score[12],and Information gain[11]. However,in practice,there is usually no shortage of unla-beled data but labels are expensive.Hence,it is of great significance to develop unsupervid feature lection algo-rithms which can make u of all the data points.In this paper,we consider the problem of lecting features in unsu-pervid learning scenarios,which is a much harder problem due to the abnce of class labels that would guide the arch for relevant information.
The feature lection aims at lecting the most relevant feature subt bad on certain evaluation criteria.This problem is esntially a combinatorial optimization prob-lem which is computationally expensive.Traditional fea-ture lection methods address this issue by lecting the top ranked features bad on some scores computed inde-pendently for each feature.The scores are usually defined to reflect the power of each feature in differentiating differ-ent class/clusters.This approach may work well on binary class/clusters problems.However,it is very likely to fail in multi class/clusters cas.Fig.(1)shows an intuitive example.There are three Gaussians in a three dimensional space.Without the label information,some popular unsu-pervid feature lection meth
,Maximum variance and LaplacianScore[17])rank the features as a>b>c.If one is asking to lect two features,the methods will lect features a and b,which is obviously sub-optimal.When deal-ing with multi class/clusters data,different features have
(a)plane a⊗b
(b)plane a⊗c(c)plane b⊗c
Figure1:A failed example for binary clusters/class feature lection methods.(a)-(c)show the projections of the data on the plane of two joint features,respectively.Without the label information,both Maximum variance and LaplacianScore[17]methods rank the features as a>b>c.If one is asking to lect two features,both Maximum variance and LaplacianScore methods will lect features a and b,which is obviously sub-optimal.
different powers on differentiating different class/clusters (e.g.,cluster1vs.cluster2and cluster1vs.cluster3). There are some studies on supervid feature lection[2] trying to solve this issue.However,without label informa-tion,it is unclear how to apply the similar ideas to unsuper-vid feature lection methods.
Inspired from the recent developments on spectral analy-sis of the data(manifold learning)[1,22]and L1-regularized models for subt lection[14,16],we propo in this pa-per a new approach,called Multi-Cluster Feature Selection (MCFS),for unsupervid feature lection.Specifically,we lect tho features such that the multi-cluster structure of the data can be well prerved.By using spectral analysis techniques,MCFS suggests a principled way to measure the correlations between
different features without label infor-mation.Thus,MCFS can well handle the data with multiple cluster structure.The corresponding optimization problem only involves a spar eigen-problem and a L1-regularized least squares problem,thus can be efficiently solved.It is important to note that our method esntially follows our previous work on spectral regression[5]and spar subspace learning[6,7].
The rest of the paper is organized as follows:in Section 2,we provide a brief review of the related work.Our multi cluster feature lection algorithm is introduced in Section3. The experimental results are prented in Section4.Finally, we provide the concluding remarks in Section5.
2.RELATED WORK
Feature lection methods can be classified into“wrapper”methods and“filter”methods[19,21].The wrapper model techniques evaluate the features using the mining algorithm that will ultimately be employed.Thus,they“wrap”the lection process around the mining algorithm.Algorithms bad on thefilter model examine intrinsic properties of the data to evaluate the features prior to the mining tasks.
For unsupervid“wrapper”methods,the clustering is a commonly ud mining algorithm[10,13,20,24]
.The algorithms consider feature lection and clustering simul-taneously and arch for features better suited to clustering aiming to improve clustering performance.However,the “wrapper”methods are usually computationally expensive [19]and may not be able to be applied on large scale data mining problems.In this paper,we are particularly inter-ested in thefilter methods which are much more efficient.
准自然实验Most of the existingfilter methods are supervid.Max-imum variance might be the most simple yet effective un-supervid evaluation criterion for lecting features.This criterion esntially projects the data points along the di-mensions of maximum variances.Note that,the Principal Component Analysis(PCA)algorithm shares the same prin-ciple of maximizing variance,but it involves feature trans-formation and obtains a t of transformed features rather than a subt of the original features.
Although the maximum variance criteriafinds features that are uful for reprenting data,there is no reason to assume that the features must be uful for discriminat-ing between data in different class.Recently,the Lapla-cianScore algorithm[17]and its extensions[30]have been propod to lect tho features which can best reflect the underlying manifold structure.LaplacianScore us a near-est neighbor graph to model the local geometric structure of the data and lects tho feature
s which are smoothest on the graph.It has been proven[17]that with label informa-tion LaplacianScore becomes Fisher criterion score.The lat-ter is a supervid feature lection method(filter method) which eks features that are efficient for discrimination[12]. Fisher criterion score assigns the highest score to the feature on which the data points of different class are far from each other while requiring data points of the same class to be clo to each other.
Wolf et al.propod a feature lection algorithm called Q-α[29].The algorithm optimizes over a least-squares crite-rion function which measures the clusterability of the input data points projected onto the lected coordinates.The op-timal coordinates are tho for which the cluster coherence, measured by the spectral gap of the corresponding affinity
matrix,is maximized[29].A remarkable property of the algorithm is that it always yields spar solutions.
3.MULTI-CLUSTER FEATURE
SELECTION
The generic problem of unsupervid feature lection is the following.Given a t of points X=[x1,x
2,···,x N], x i∈R M,find a feature subt with size d which contains the most informative features.In other words,the points {x′1,x′2,···,x′N}reprented in the d-dimensional space R d can well prerve the geometric structure as the data repre-nted in the original M-dimensional space.
Since naturally occurring data usually have multiple clus-ters structure,a good feature lection algorithm should con-sider the following two aspects:
•The lected features can best prerve the cluster struc-ture of the data.Previous studies on unsupervid fea-
ture lection[13,20,24]usually u Gaussian shape
clusters.However,recent studies have shown that hu-
man generated data are probably sampled from a sub-
manifold of the ambient Euclidean space[1,25,28].
描写景色的段落The intrinsic manifold structure should be considered
while measuring the goodness of the clusters[22].
•The lected features can“cover”all the possible clus-
ters in the data.Since different features have differ-
ent power on differentiating different clusters,it is cer-
tainly undesirable that all the lect features can well
differentiate cluster1and cluster2but failed on dif-
ferentiating cluster1and cluster3.
In the remaining part of this ction,we will introduce our Multi-Cluster Feature Selection(MCFS)algorithm which con-siders the above two aspects.We begin with a discussion on spectral embedding for cluster analysis with arbitrary shapes.
3.1Spectral Embedding for Cluster Analysis To detect the cluster(arbitrary shapes)structure of data, spectral clustering techniques[8,22,26]received significant interests recently.The spectral clustering usually clusters the data points using the top eigenvectors of graph Laplacian [9],which is defined on the affinity matrix of data points. From the graph partitioning perspective,spectral clustering tries tofin
d the best cut of the graph so that the prede-fined criterion function can be optimized.Many criterion functions,such as ratio cut[8],average association[26],and normalized cut[26]have been propod along with the corre-sponding eigen-problems forfinding their optimal solutions. Spectral clustering has a clo connection with the stud-ies on manifold learning[1,25,28],which consider the ca when the data are drawn from sampling a probability dis-tribution that has support on or near to a submanifold of the ambient space.In order to detect the underlying mani-fold structure,many manifold learning algorithms have been propod[1,25,28].The algorithms construct a nearest neighbor graph to model the local geometric structure and perform spectral analysis on the graph weight matrix.This way,the manifold learning algorithms can“unfold”the data manifold and provide the“flat”embedding for the data points.The spectral clustering can be thought as a two-step approach[1].Thefirst step is“unfolding”the data manifold using the manifold learning algorithms and the cond step is performing traditional clustering(typically k-means)on the“flat”embedding for the data points[22].
Consider a graph with N vertices where each vertex cor-responds to a data point.For each data point x i,wefind its p nearest neighbors and put an edge between x i and its neighbors.There are many choices to define the weight ma-trix W on the graph.Three of the most commonly ud are as follows:
1.0-1weighting.W ij=1if and only if nodes i and j
are connected by an edge.This is the simplest weight-
ing method and is very easy to compute.
2.Heat kernel weighting.If nodes i and j are con-
宝宝身高体重标准表
nected,put
W ij=e−
x i−x j 2
σ
Heat kernel has an intrinsic connection to the Laplace
Beltrami operator on differentiable functions on a man-
ifold[1].
3.Dot-product weighting.If nodes i and j are con-
nected,put
W ij=x T i x j
Note that,if x is normalized to have unit norm,the
dot product of two vectors is equivalent to the cosine
similarity of the two vectors.
If the heat kernel or dot-product weighting is ud,some rearchers[22]u a complete ,put an edge be-tween any two points)instead of the p-nearest neighbors graph.
Define a diagonal matrix D who entries are column(or row,since W is symmetric)sums of W,D ii= j W ij, we can compute the graph Lapalcian L=D−W[9].The “flat”embedding for the data points which“unfold”the data manifold can be found by solving the following generalized eigen-problem[1]:
Ly=λDy(1) Let Y=[y1,···,y K],y k’s are the eigenvectors of the above generalized eigen-problem with r
espect to the smallest eigen-value.Each row of Y is the“flat”embedding for each data point.The K is the intrinsic dimensionality of the data and each y k reflects the data distribution along the correspond-ing dimension(topic,concept,etc.)[1].When one tries to perform cluster analysis of the data,each y k can reflect the data distribution on the corresponding cluster.Thus,if the cluster number of the data is known,the K is usually t to be equal to the number of clusters[22].
3.2Learning Spar Coefficient Vectors
After we obtain the“flat”embedding Y for the data points, we can measure the importance of each feature along each intrinsic dimension(each column of Y),correspondingly,the contribution of each feature for differentiating each cluster.
Given y k,a column of Y,we canfind a relevant subt of features by minimizing thefitting error as follows:
min
a k
y k−X T a k 2+β|a k|(2)
where a k is a M-dimensional vector and|a k|= M j=1|a k,j| denotes the L1-norm of a k.a k esntially contains the com-bination coefficients for different features in approximating徐悲鸿的画
y k.Due to the nature of the L1-norm penalty,some coeffi-cients will be shrunk to exact zero ifβis large enough.In this ca,we can lect a subt containing the most rele-vant features(corresponding to the non-zero coefficients in a k)with respect to y k.Eq.(2)is esntially a regression problem.In statistics,this L1-regularized regression prob-lem is called LASSO[16].
The regression problem in Eq.(2)has the following equiv-alent formulation:
min
a k
y k−X T a k 2

本文发布于:2023-06-17 09:31:08,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/974526.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:段落   自我介绍   描述   景色   女生
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图