Learning a classification model for gmentation, in

更新时间:2023-06-23 14:50:50 阅读：评论：0

Learning a Classiﬁcation Model for Segmentation Xiaofeng Ren and Jitendra Malik

Computer Science Division

University of California at Berkeley,Berkeley,CA94720

xren,malik@cs.berkeley.edu

Abstract

We propo a two-class classiﬁcation model for group-ing.Human gmented natural images are ud as positive examples.Negative examples of grouping are constructed by randomly matching human gmentations and images. In a preprocessing stage an image is overgmented into su-perpixels.We deﬁne a variety of features derived from the classical Gestalt cues,including contour,texture,bright-ness and good continuation.Information-theoretic analy-sis is applied to evaluate the power of the grouping cues. We train a linear classiﬁer to combine the features.To demonstrate the power of the classiﬁcation model,a simple algorithm is ud to randomly arch for good gmenta-tions.Results are shown on a wide range of images.

1.Introduction

Perceptual grouping can be formulated as an opti-mization problem in a number of different frameworks, such as graph partitioning[22,16,8,5]or variational approaches[13].The objective function being optimized is typically driven by the designer’s intuition or computational convenience.The theme of this paper is to derive the“right”optimization criterion.This is done in a learning approach using a databa of human gmented images.

We formulate the computational problem of gmenta-tion as a classiﬁcation between“good”gmentations and “bad”gmentations.Figure1illustrates our basic ap-proach.Figure1(a)is an image from the Corel Imageba, (b)shows the image superimpod with a human marked gmentation,and(c)is the same image with a“wrong”gmentation.Our intuition tells us that the gmentation in(b)is“good”and the one in(c)is“bad”.

How do we distinguish good gmentations from bad gmentations?Classical Gestalt theory has developed var-ious principles of grouping[25,14]such as proximity,sim-ilarity and good continuation.The principle of good contin-uation states that a good gmentation should have

smooth

(a)(b)(c)

Figure1.We formulate gmentation as classiﬁcation

between good gmentations(b)and bad gmenta-

tions(c).We u Gestalt grouping cues as features

and train a classiﬁer.Human gmented images are

ud as examples of good gmentations.Bad g-

mentations are constructed by randomly matching a

human gmentation to a different image.

boundaries.The principle of similarity is twofold:

1.intra-region similarity:the elements in a region are

similar.This includes similar brightness,similar tex-ture,and low contour energy inside the region;朝花夕拾主要内容

2.inter-region(dis)similarity:the elements in different

regions are dissimilar.This in turn includes dissimilar brightness,dissimilar texture,and high contour energy on region boundaries.

男服务员

The classical principles of grouping have inspired many previous approaches to gmentation.However,the Gestalt principles are ceteris paribus rules,which means that they distinguish competing gmentations only when everything el is equal.Many of the previous works have made ad-hoc decisions for using and combining the cues.

In this work,we learn a classiﬁcation model for gmen-tation from the Gestalt cues.A databa of human marked gmentations has been established[10].We u the hu-man gmented images in this databa as positive exam-ples.For negative examples,we randomly match a human gmentation to a different image,an example of which has been given in Figure1(c).

The outline of the paper is as follows.In2we intro-duce a preprocessing stage which organizes an image into

“superpixels”.In3,we deﬁne a t of features for g-ments,including Gestalt cues of contour,texture,bright-ness and good continuation.The features are evaluated using information-theoretic measures.From the features we train a logistic regression classiﬁer.Bad on this model for gments,in4we formulate gmentation as an op-timization problem of a linear objective function over the space of gmentations.To demonstrate the power of our classiﬁcation model,we d

esign a simple algorithm to ran-domly arch for good gmentations.The experimental results are shown in5.6discuss related works and concludes the paper.

2.Overgmentation as Preprocessing

In this ction we prent a preprocessing stage to group pixels into“superpixels”.The motivations of this prelimi-nary grouping are:(1)pixels are not natural entities;they are merely a conquence of the discrete reprentation of images;and(2)the number of pixels is high even at moder-ate resolutions;this makes optimization on the level of pix-els intractable.We would like to work with“superpixels’which are local,coherent,and which prerve most of the structure necessary for gmentation at the scale of interest.

We apply the Normalized Cuts algorithm[22,8]to pro-duce the superpixel map.Both contour and texture cues are ud.The afﬁnity matrix has local connections only. Figure2shows an example of the overgmentation with the number of superpixels.We obrve from this example that the superpixels are roughly homogeneous in size and shape;this fact simpliﬁes the computation in later stages.Some structures in the human gmentation are lost, but they are usually minor details,much smaller in scale than the objects we are interested in.The reconstructed g-mentation is a good approximation of the original one.

To quantify the quality of this approximation,we u a contour-bad measure to verify the superpixel maps against the human gmentations.In particular,we com-pute the percentage of the human marked boundaries being “covered by”(within aﬁxed small distance of)the super-pixel boundaries.This is the recall rate ud in[9].Figure3 shows the results for images of size-by-.As ex-pected,the recall rates increa with.In our experiments we have found that is sufﬁcient.

3.What is a Good Segment?

A gmentation is a collection of gments.To answer the question“What is a good gmentation?”,we need to answer“What is a good gment?”ﬁrst.In this ction, we will deﬁne a t of features for gments,evaluate the ufulness of the features,and train a classiﬁer from

them.

(a)

(b)

(c)(d)

Figure2.An example of superpixel maps.(a)is the

original image;(b)is a human marked gmentation;

(c)is a superpixel map with;(d)shows a

reconstruction of the human gmentation from the

superpixels:we assign each superpixel to a gment

in(b)with the maximum overlapping area and extract

the superpixel boundaries.

3.1.Features for grouping

For static images,the classical Gestalt principles of grouping include proximity,similarity,good continu

ation (curvilinear continuity),closure as well as symmetry and parallelism.In our model,for a gment we deﬁne the following features:

1.inter-region texture similarity;

2.intra-region texture similarity;

3.inter-region brightness similarity;

4.intra-region brightness similarity;

5.inter-region contour energy;

6.intra-region contour energy;

7.curvilinear continuity.寒食古诗诗意

Texture Similarity

For texture cues we follow the discriminative framework of texton ,[8]).The image isﬁrst convolved with a bank ofﬁlters of multiple orientations.Bad on a vector quantization of theﬁlter outp海底两万里读后感800字

uts,the pixels are clus-tered into a number of texton channels.This gives us a de-scriptor for each region,namely the distribution of textons inside its support.The texture difference of two regions is then measured as the distance between two histograms. We make one modiﬁcation here in theﬁltering stage:when we apply theﬁlterbank to the image,we restrict the support of theﬁlters to be within a single superpixel(normalized ,[6]).

In the next step,we convert the distance into a log likelihood ratio:let same denote the t of all superpixels

Figure3.The percentage of human marked bound-

aries covered by the superpixel maps.The number of

superpixels varies from to.Distance toler-

ance is t at,and pixels respectively,for images

of size-by-.For and a tolerance of

pixels,approximately of the human marked

boundaries are covered.

pairs such that they appear in the same gment of a human gmentation,and let diff denote the t of all su-perpixel pairs such that they appear in different gments in a human gmentation.We compute the distance for all pairs of superpixels in same,and denote the distribution of the distances as same.Similarly we collect diff, the distribution of distance on the t diff(e Figure4 for the empirical distributions).Let be the dis-tance between the texture histogram of a superpixel and a gment.The texture similarity between and is de-ﬁned as:

same

diff

(a)(b)

Figure4.The empirical distributions of distance

of texture histograms.(a)is between a pair of

superpixels in the same human marked gment;

is between a pair of superpixels in different gments. (b)shows the log likelihood ratio.

The log likelihood ratio measures the signiﬁcance of the value.We u this basic texture similarity mea-sure to deﬁne two texture features for a gment ,intra-region texture similarity and inter-region texture similarity.Figure(a)illustrates the deﬁnition.The intra-region texture similarity sums over all the superpixels in the region:

and the inter-region similarity sums over all the superpixels on,the boundary superpixels of:

where is the gment adjacent to.If there are mul-tiple gments adjacent,we take the average of similarity values.

S’q

(a)(b)

Figure5.(a)The intra-region similarity compares the

descriptor of a superpixel to the gment con-

taining it.The inter-region similarity compares the

descriptor of a superpixel on the boundary of to

the adjacent gment.(b)Curvilinear continuity

of is measured by the tangent changes at superpixel

junctions along the boundary of.

Brightness similarity

The intra-region brightness similarity and inter-region brightness similarity are deﬁned in an iden-tical way.The brightness descriptor for each region is a his-togram of brightness values.We compute the distance of histograms and u empirical data to convert the distance into a log likelihood ratio.This

basic similarity measure is incorporated into the intra-and inter-region similarity cues.

Contour energy

Contour cues are computed at the level of pixels.Weﬁrst compute the orientation energy[12,8]at each pixel. The orientation energy is converted to a soft”contourness”, ,by a non-linear transform[8,18].The inter-region contour energy is the summation of over all the pixels on the boundary of,and the intra-region contour energy is the summation of over all the pixels on the superpixel boundaries inside.

Good continuation

Curvilinear continuity is measured as follows:for each ad-jacent pair of superpixels and on the boundary of a g-

ment,there is a change of tangent at the junction(e Figure(b)).This measures theﬁrst-order smoothness of the boundary:the larger this angle,the less smooth

the boundary of.From the boundaries in the human g-mentations,we collect the distribution tangent of tangent changes.Let be the t of all superpixel junctions on

the boundary of,the curvilinear continuity is de-ﬁned as

tangent

Normalizing the features

The features we have deﬁned are unnormalized and can-not be directly compared with one another.To normalize them,we notice that all the features we have deﬁned are summations of basic quantities.For example,consider the intra-region texture similarity. We assume that the’s are random variables with the same mean and variance for all pairs such that.If there are superpixels in,we normalize as.The maximum likeli-hood estimates of and are ud.Other features are nor-malized in the same way.

3.2.Power of the Gestalt cues

Before we train a classiﬁer of gments from the fea-tures above,one interesting question is to ask how uful the grouping cues are.We conduct an information-theoretic analysis to measure the power of the cues in a model-and algorithm-independent way.

Each gment is associated with a t of features

and a class label:if is a gment from a good gmen-tation,;if is from a bad gmentation,. From the datats we collect the joint distribution of and the features.For any feature,we compute the mutual information.This is the amount of information contained in about the classiﬁcation.The distributions are normalized and the marginal entropy of is(bits). Theﬁrst column of Table1(a)shows the results for individ-ual features.We also combine each pair of inter-and intra-features together to evaluate the overall power of contour, texture,and brightness cues.The results are listed in the ﬁrst column of Table1(b).

From this analysis of mutual information weﬁnd that the prence of boundary contours is the most informative grouping cue.The texture cues and brightness cues are approximately equally informative.The intra-region cues by themlves are usually non-informative.Combined with inter-region cues,however,they make signiﬁcant contribu-tions.Curvilinear continuity turns out to be a powerful cue in our analysis.The power of continuity is revealed due to the way we construct the datat of bad gmentations( e the introduction).Becau a randomly assigned g-mentation often disagrees with the superpixel map,which is constructed from the image,the resulting boundaries are jig-jaggy,poor in continuity.

One interesting question to ask is whether the normal-ized convolution bad on the superpixel map

is helpful for grouping.We repeat the texture analysis with stan-dardﬁ,not making u of the superpixel masks. Information-theoretic analysis shows that the joint informa-tion of inter-region texture and inter-region brightness cues,

,increas from to.This re-sult suggests that a reasonable support mask of image re-gions does help with texture analysis,if texture and bright-ness are ud simultaneously.

3.3.Training the classiﬁer

We have formulated the problem of gmentation as a two-class classiﬁcation.This is one of the most well-studied problems in the statistical learning and we have a variety of techniques at our disposal.We u a simple logistic regres-sion classiﬁer,which linearly combines the features:

(1)

小笼汤包The higher the value of is,the more likely is a good gment.The weights are easily learned by maximizing the likelihood on training data with the standard iterative reweighted least squares algorithm[4].gments are ud as training data and gments as test data. The initialization is random and the convergence is fast.For intra-region features,the weights are negative.

To gain more insights into this combination of features, we examine the empirical distributions more cloly.We collect the joint distributions of a pair of features,both on the positive examples and the negative examples,to e if they are linearly parable.Figure6shows contour plots of two examples of the empirical density functions.We have found that the normalized features are roughly Gaussian distributed and a linear classiﬁerﬁts the data well.(The authors of[9]also reported that for vision data of low di-mension and poor parability,logistic regression performs as well as other sophisticated techniques.)

One way to evaluate our model is to look at theﬁ-nal gmentation results,which we prent in Section5. Information-theoretic analysis again provides us an alterna-tive way of evaluation.Ideally,we would like our model to capture all the information contained in the grouping cues.

Figure6.Iso-probability contour plots of empirical

distributions for a pair of features.The plots suggest

that:(1)the normalized features are well-behaved;for

both class a Gaussian model would be a reasonable

approximation.And(2)a linear classiﬁer would per-

form well.

That is,the label and the features would be condition-ally independent given the output of the model.The residual information is measured by the mutual information of and conditioned on.The results have been listed in the cond columns of Table1.We obrve that there is little residual information left in the features,which indi-cates that the linear classiﬁerﬁts the data well.

To further evaluate our classiﬁcation model,we u the precision-recall framework[19,2].Precision is the fraction of detections which are true positives.Recall is the frac-tion of true positives which are detected.Figure7shows the precision-recall curves for three cas;the results are al-most identical.This suggests that for this problem(1)the logistic regression model generalizes well;and(2)sophisti-cated classiﬁcation techniques may not outperform the sim-ple linear model.

Figure7.Precision-recall analysis for(1)simple lo-

gistic classiﬁer on training data;(2)simple logistic

classiﬁer on test data;and(3)boosted logistic classi-

ﬁer on test data.The three curves are almost identical.

4.Finding Good Segmentations

“What is a good gmentation?”We make the simplify-ing assumption that the“goodness”of the gments in a

Feature Information Residual Info.

Contour:inter-0.3870.010

intra-0.0120.010 Texture:inter-0.1370.005

intra-0.0300.008 Brightness:inter-0.1120.005

intra-0.0490.007 Continuity:0.1980.002

(a)

Combined Feature Information Residual Info.

Contour0.5100.024

Texture0.2200.026

戚继光抗倭

Brightness0.2320.025

(b)

Table1.Information-theoretic analysis of grouping

cues.(a)shows the results for individual features.(b)

shows the results when pairs of intra-and inter-region

cues are combined.Theﬁrst column is the amount

of information the features contain about the class

label.The cond column is the amount of residual

information the features retain when conditioned on

the model output.The marginal entropy of the class

label is(bits).

gmentation are independent.This leads us to the fol-lowing criterion:

(2) which sums the classiﬁer function in Eqn(1)over the g-ments.The problem ofﬁnding the best gmentation be-comes the optimization of in the space of all gmenta-tions.

The objective is simple in form but the arch space of all gmentations is large.Following the Markov Chain Monte Carlo paradigm[3,24],we adopt a simple strategy of random arch bad on simulated annealing.

The dynamics in this random arch involves three basic moves:(1)shift:a superpixel is shifted from its gment to an adjacent gment;(2)merge:two adjacent gments are merged into one;and(3)split:a gment is split into two. Theﬁrst two moves are straightforward.For splitting a g-ment,we u a simple method by clustering the superpixels in the gment bad on location and mean intensity.This clustering is also ud to initialize the arch.

At each step,the algorithm randomly picks one of the moves above and construct a new gmentation.If

,we accept the move.Otherwi,we accept with probability where is the tempera-ture,decreasing linearly over time.

The algorithm is naive;nevertheless it demonstrates the power of our classiﬁcation model.There exist many other possible ways of exploring the space of gmentations.

5.Experimental Results

东天目山Figure9shows some results of our algorithm on images from the Corel Imageba.The images are all gray-scale and of size-by-.In our current implementation,the random arch itlf takes about to minutes on a Pen-tium III Hz processor.

We have found that the gmentations are biad toward small regions.One reason is that our objective function is a simple sum over individual gments and is not he number of gments.To gment an im-age into approximately equally sized regions,and also to provide a degree of ur control over the scale of the g-mentation,we introduce a prior distribution on gment size .Figure8shows the empirical distribution of in the human gmented images.We approximate this prior with a log-normal distribution,and the objective function

becomes:

(3)

Figure8.The empirical distribution of(the logarithm

of)gment size.Both extremely small and large

gments are uncommon.We approximate this prior

on with a log-normal distribution.

Our naive algorithm does occasionally suffer from the problem of local maxima.One reason lies in the asymmetry of the simple dynamics:merging is trivial but splitting is hard.To obtain theﬁnal results we prent in Figure10,the algorithm is allowed to run for three times and the solution with the best value is picked.A weak prior with (superpixels)and is ud.6.Discussions

In this paper we have prented a discriminative frame-work for gmentation as the classiﬁcation of

“good”g-mentations and“bad”gmentations.The Gestalt grouping cues are combined in a principled way and we have empir-ically measured the power of the cues.A linear classi-ﬁer and a simple random arch algorithm have produced promising results on a variety of natural images.

This work is motivated in part by both the Normalized Cuts[22,8]and the DDMCMC work[24].Our approach is in the discriminative paradigm as the Normalized Cuts. The basic contour and texture cues in our model is similar to tho in[8].However,our approach differs from[22] in two important aspects.First,we have constructed our model on the level of gments.This has enabled us to(1) deﬁne relationships between the parts and the whole;(2) to easily incorporate mid-level cues such as good continua-tion,and,(3)instead of relying on intervening contours[7], we u the contour cues in a straightforward way.Second, we have formulated grouping as a two-class classiﬁcation problem.This framework connects the gmentation as a computational problem with the ecological statistics of nat-ural images[9,2]and the rich theory of learning.

The Normalized Cuts criterion is driven by computa-tional convenience,and does not have a clear statistical in-terpretation.The random walk formulation of Normalized Cuts[11]deﬁnes gmentation as a one-class problem and has only been applied to special class of images.The Normalized Cuts criterion does lead to a computationally tractable problem of spectral clustering.Our framework,

on the other hand,has to solve a difﬁcult optimization in the space of all gmentations.

The DDMCMC[24]is a generative approach which builds explicit models of image regions.The DDMCMC framework faces a computational challenge similar to ours. The main difference in philosophy is -erative.Solving an easier problem of discrimination,we are able to succeed with a linear classiﬁer and a naive arch al-gorithm.As we have found out,boundary contour is the most informative grouping cue,and it is in esnce discrim-inative.Such contour cues are ud indirectly in[24]. Acknowledgments.This rearch was supported by NSF through a Digital Library Grant IRI-9411334.

References

[1] E.Borenstein and S.Ullman.Class-speciﬁc,top-down g-

mentation.In ECCV’02,volume2,pages109–124,2002.黄雀儿

[2] C.Fowlkes,D.Martin,and J.Malik.Learning afﬁnity func-

tions for image gmentation:combining patch-bad and gradient-bad approaches.In CVPR’03,volume2,pages 54–61,2003.

本文发布于:2023-06-23 14:50:50，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1051412.html

上一篇：基于Matlab的彩色图像分割

下一篇：The Intel IA-32 Architecture