首页 > 英文翻译

ESOM-Maps tools for clustering, visualization, and classification with Emergent SOM

更新时间:2023-06-22 09:36:05 阅读：评论：0

ESOM-Maps:tools for clustering,visualization, and classiﬁcation with Emergent SOM

Alfred Ultsch,Fabian M¨o rchen

Data Bionics Rearch Group,University of Marburg

D-35032Marburg,Germany

March17,2005

Abstract

An overview on the usage of emergent lf organizing maps is given.

U-Maps visualize the distance structures of high dimensional data ts.

P-Maps show their density structures and U*-Maps combine the advan-

tages of the mentioned maps to a visualization suitable to detect non-

trivial cluster structures.A conci summary on the usage of Emergent

Self-organizing Maps(ESOM)for data mining is given.The tasks of vi-

sualization,clustering,and classiﬁcation as they can be performed with

the Databionics ESOM Tools are described.

1Introduction

The power of lf-organization that allows the emergence of structure in data is often neglected.We think this is in part due to a misusage of Self-organizing Maps that is widely spread in the scientiﬁc literature.The maps ud by most authors are usually very small,consisting of somewhere between a few and some tens of [1,2,3]).Also,the concept of boundless id maps[4])to avoid border eﬀects is rarely ud.Using small SOM is almost identical to a k-Means clustering with k equal to the number of nodes in the map.The topology prervation of the SOM projection is of little u when using small maps.Emergent phenomena involve by deﬁnition a large number of individuums,where large means at least a few thousands.This is why we u large SOMs and called them Emergent Self-Organizing Maps(ESOM)to emphasize the distinction.It has been demonstrated,that using ESOM is a signiﬁcantly diﬀerent process from using k-Means[5].

The purpo of this article is to give a conci summary on the usage of Emergent Self-organizing Maps(ESOM)for data mining.We describe the tasks We thank Mario N¨o cker,Christian Stamm,Michael Thies,and Martin K¨u mmerer for programming most of the Databionics ESOM Tools.

of visualization,clustering,and classiﬁcation as they can be performed with our new software-the Databionics ESOM Tools.

2Map Architecture

If M⊂R2the map has planar topology.In a quadgrid arrangement the number of immediate neighbors of a neuron is4,for hexgrids there are6immediate neighbors.

Note,that the number of immediate neighbors is less at the borders of the grid.In the map spaces border eﬀects occur,enlargening the probability of topology errors[6].To avoid such border eﬀects,grids can be embedded in a ﬁnite but boundless a sphere or the toroid PAC-man space(PMS)(e [4]).In PMS the top row is connected to the bottom row and the left column to the right column within the lattice.

If the number of rows and columns in a2D grid is equal,the map is called square,otherwi rectangular.The ratio of rows to columns is propod to be chon corresponding to the ratio of theﬁrst and cond eigenvalues of the covariance matrix in[7].An experimental analysis of topology errors showed, however,that the ratio of rows and columns should be chon diﬀerent from unity even when no dominant direction of variance exists[6].

We recommend the following ESOM architecture:boundless toroid grids with at least4000neurons and a ratio of rows and columns diﬀerent from unity. This avoids border eﬀects,topology errors,and enables an intuitive undistorted visualization.

3Visualization

wolf

The result of ESOM training[7]is a low dimensional grid of high dimensional prototype vectors.The positions of the bestmatches for the data points alone does often not oﬀer an intuitive visualization of the structures prent in the high dimensional space.Additional methods are needed to visualize the structures, the most common being height values forming a3D landscape on top of the grid.

The example to demonstrate the visualizations is taken from[8].It consists of a2dimensional datat typical for sonar applications.The dimensions are ENGY and TIME,the data t is a mixture of two G

aussians with2048points each(e Figure1(a)).

Distance-bad Visualization:The U-Matrix[9]is the canonical display of ESOM.The local distance structure is displayed at each neuron as a height value creating a3D landscape of the high dimensional data space.The height is calculated as the sum of the distances to all immediate neighbors normalized by the largest occurring height.This value will be large in areas where no or few data points reside,creating mountain ranges for cluster boundaries.The sum will be small in areas of high densities,thus clusters are depicted as val-

leys.Figure1(b)shows the U-Matrix of the EngyTime datat,darker colors correspond to large distances.

Density-bad Visualization:While distance-bad methods usually work well for clearly parated clusters,problems can occur with slowly changing den-sities and overlapping clusters.Density-bad methods more directly measure the density in the data space sampled at the prototype vectors.The P-Matrix[4] displays the local density measures with the Pareto Density Estimation(PDE) [10],an information optimal kernel density estimation.Figure1(c)shows the P-Matrix of the EngyTime datat,

darker colors correspond to larger densities.

Distance-and Density-bad Visualization:In den regions of the data space the local distances depicted in an U-Matrix are presumably distances measured inside a cluster.Such distances may be disregarded for the purpo of clustering.In thin populated regions of the data space,however,the distances do matter.In this ca the U-Matrix heights correspond to cluster boundaries. This leads to the deﬁnition of an U*-Matrix[11]which combines the distance bad U-Matrix and the density bad P-Matrix.The values of the U-Matrix are dampened in highly den regions,unchanged in regions of average density, and emphasized in spar regions.

pvp是什么意思

The advantage of the U*-Matrix over the U-Matrix in datats with clusters that are not clearly parated in the high dimensional space can be e from the example.The U*-Matrix in Figure1(d)shows clearly the two Gaussians of the EngyTime datat.The U-Matrix in Figure1(b),however,may mislead a clustering.

Topology-bad Visualizations:There are two kinds of topographical errors that have to be considered.The forward projection error(FPE)occurs when a pair of similar data points is assigned to a distant pair of positions on the map.The backward projection error(BPE)corresponds to a pair of cl

o grid positions being the image of two distant data points.The latter can be visualized using Zrehen’s Measure[12],for the former the Minimal-U-Path[6] can be ud.

Bestmatch Visualizations:A display of the bestmatches can be placed on top of the images created by the above mentioned methods.Each neuron that is a bestmatch for at least one data point can be marked with a point.The point can be colored indicating a known or chon cluster membership.

4From Matrix to Map大耳朵英语网

For boundless grids the visualizations should be viewed in tiled mode[4],dis-playing four adjacent copies of the grid.Clusters stretching over the edge of the grid image can then be en connected.Unfortunately every data point and cluster is shown in4diﬀerent places.This can be compensated by extracting a map from the tiled view removing the redundancies[4].The obvious rem-blance with geographical landscapes led us to call the displays maps.For an U-Matrix we get an U-Map and similar P-Maps and U*-Maps.The t of all the maps is called ESOM-Maps.The remblance of ESOM-Maps to geo-

gas3

(a)Data (b)U-Matrix

(c)P-Matrix (d)U*-Matrix

Figure 1:ESOM visualizations for 2clusters in the EngyTime datat

graphical maps or landscapes may be enhanced by computer graphical means such as texturing,coloring and lightening.The Figure 2(a)shows a tiled view of an U*-Matrix with 3clusters.Figure 2(b)shows the non-redundant U*-Map display extracted from the center of the tiled view.

5Clustering

The clustering of the ESOM can be performed at two diﬀerent levels.First,the bestmatches and thus the corresponding data points can be manually grouped into veral clusters.Not all points need to be labeled,outliers are usually eas-ily detected and can be removed.The cluster membership can be visualized by a coloring of the bestmatches.Secondly,the neurons can be clustered.This way regions on the map reprenting a cluster can be identiﬁed and ud for classiﬁcation of new data (e Section 6).The regions can be visualized by mi-transparent colored regions in order not to overlay the background visu-alizations completely.The visualization can show whether there actually is a cluster structure.Outlying data points will also be easily detectable.In con-trast,the popular k -Means algorithm will always converge to the given number of clusters no matter whether the data supports this structure.Outliers will be handled like any other data point.

(a)Tiled U*-Matrix(b)Extracted U*-Map

Figure2:ESOM visualizations for3clusters in the Skating data

6Classiﬁcation

Labeling all(or most)neurons instead of only the bestmatches creates a sub-symbolic classiﬁcator similar to a k-nearest neighbor(KNN)classiﬁer with k=1 that can be applied to new data automatically.The main diﬀerence to KNN is,that the ur can u the visualization of the ESOM to create the labeling whereas KNN does not oﬀer this convenience.Further KNN classiﬁcation al-ways classiﬁes a point,no matter how near(or rather far)the neighbors are. In contrast,ESOM classiﬁcation oﬀers a don’t know class by leaving neurons for sparly populated regions parating clusters.

keep upWe have successfully ud ESOM classiﬁcation to detect movement phas in a multivariate time ries from Inline-Speed Skating[13,14].Bad on the results of a single skating speed we tried to extend the analysis to six diﬀerent speeds.Each datat contained30K6-dimensional vectors.We ud a pooled sample from all speeds for training.Three clusters were detected,the corre-sponding regions on the map were labeled.The U*-Matrix and the U*-Map are shown in Figure2(a)and Figure2(b),respectively.The six full datats were then classiﬁed into the three movement phas by

bestmatch projection on the map.The resulting gmentation of the time ries was labeled by an expert into the movement phas glide,push,and swing.

7Databionics ESOM Tools

Most of the described methods for visualization,clustering,and classiﬁcation have been implemented in a software tool called Databionics ESOM Tools1.The main part is a graphical front-end to perform tasks like visualization,outlier removal,clustering of bestmatches and neurons,creation of non-redundant map views.The visualization of the maps is designed in a modular way.The diﬀerent background displays of the map can be applied to subts of the data features and can be combined with foreground displays of the bestmatches.Several color 1databionic-esom.sf

bca

Figure 3:U-Matrix during ESOM training for 7clusters (Epochs 0+k ∗5)gradients with an adjustable number of colors are available as well as contour rendering.Even the training process can be visualized by creating a slide show of the map unfolding in the data space (e Figure 3).The training,rendering,projection,and classiﬁcation tasks can also be run from the command line to allow an automatic analysis of large data ts.

The programs are written in Java and the source code is available under the GPL[15].We welcome other scientists to get involved in the further development of the tool.

8Summary

Emergent SOMs are a powerful tool for clustering and classiﬁcation.To exploit nontrivial emergent phenomena,however,large maps must be ud.Only large maps can adequately project complicated structures onto the two dimensional map space.Distance-bad and density-bad visualization of ESOM can be ud to analyze the data for possible clusters.A (partly)labeled ESOM rves as a sub-symbolic classiﬁcator in the spirit of KNN classiﬁcation.Many ESOM related tasks can be performed with the freely available Databionics ESOM Tools .

rad

References

[1] C.Li,P.S.Yu,and V.Castelli.MALM:a framework for mining quence databa at multiple abstraction levels.In Proc.of the 7th ACM CIKM ,pages 267–272,Bethesda,MD,1998.country road take me home

[2]T.C.Fu,F.L.Chung,V.Ng,and R.Luk.Pattern Discovery from Stock Time Series Using Self-Organizing Maps.Workshop Notes of KDD2001Workshop on Temporal Data Mining,San Francisco ,pages 27–37,2001.

秘密花园2

[3] E.Pampalk,A.Rauber,and D.Merkl.Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps.In Proc.ICANN’02,Madrid,Spain,August 27-302002.Springer.

[4] A.Ultsch.Maps for the Visualization of high dimensional Data Spaces.In Proc.

WSOM’03,Japan ,2003.[5] A.Ultsch.Self Organizing Neural Networks perform diﬀerent from statistical k-means clustering.In Proc.GfKl 1995,Bal,Swiss .

[6] A.Ultsch and L.Herrmann.Architecture of emergent lf-organizing maps to reduce projection errors.In Proc.ESANN,Bruges,Belgium ,2005.

[7]T.Kohonen.Self-Organizing Maps .Springer,1995.

[8]P.M.Baggenstoss.Statistical Modeling Using Gaussian Mixtures and HMMs with Mat-lab,www.npt.nuwc.navy.mil/Csf/htmldoc/pdf/pdf.html ,2002.

神探夏洛克第3季6

本文发布于:2023-06-22 09:36:05，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/153552.html

上一篇：TSC TS2576 说明书

下一篇：Plasma Science and Technology (PST) 模板

标签：花园秘密

留言与评论（共有 0 条评论）