Learning a classification model for gmentation, in

更新时间:2023-07-13 19:41:13 阅读: 评论:0

2012福建高考英语Learning a Classification Model for Segmentation Xiaofeng Ren and Jitendra Malik
Computer Science Division
University of California at Berkeley,Berkeley,CA94720
xren,malik@cs.berkeley.edu
Abstract
We propo a two-class classification model for group-ing.Human gmented natural images are ud as positive examples.Negative examples of grouping are constructed by randomly matching human gmentations and images. In a preprocessing stage an image is overgmented into su-perpixels.We define a variety of features derived from the classical Gestalt cues,including contour,texture,bright-ness and good continuation.Information-theoretic analy-sis is applied to evaluate the power of the grouping cues. We train a linear classifier to combine the features.To demonstrate the power of the classification model,a simple algorithm is ud to randomly arch for good gmenta-tions.Results are shown on a wide range of images.
1.Introduction
Perceptual grouping can be formulated as an opti-mization problem in a number of different frameworks, such as graph partitioning[22,16,8,5]or variational approaches[13].The objective function being optimized is typically driven by the designer’s intuition or computational convenience.The theme of this paper is to derive the“right”optimization criterion.This is done in a learning approach using a databa of human gmented images.
We formulate the computational problem of gmenta-tion as a classification between“good”gmentations and “bad”gmentations.Figure1illustrates our basic ap-proach.Figure1(a)is an image from the Corel Imageba, (b)shows the image superimpod with a human marked gmentation,and(c)is the same image with a“wrong”gmentation.Our intuition tells us that the gmentation in(b)is“good”and the one in(c)is“bad”.
How do we distinguish good gmentations from bad gmentations?Classical Gestalt theory has developed var-ious principles of grouping[25,14]such as proximity,sim-ilarity and good continuation.The principle of good contin-uation states that a good gmentation should have
smooth
(a)(b)(c)
Figure1.We formulate gmentation as classification
between good gmentations(b)and bad gmenta-baerman
tions(c).We u Gestalt grouping cues as features
and train a classifier.Human gmented images are
ud as examples of good gmentations.Bad g-
mentations are constructed by randomly matching a
human gmentation to a different image.
boundaries.The principle of similarity is twofold:
1.intra-region similarity:the elements in a region are
similar.This includes similar brightness,similar tex-ture,and low contour energy inside the region;
2.inter-region(dis)similarity:the elements in different
regions are dissimilar.This in turn includes dissimilar brightness,dissimilar texture,and high contour energy on region boundaries.
The classical principles of grouping have inspired many previous approaches to gmentation.However,the Gestalt principles are ceteris paribus rules,which means that they distinguish competing gmentations only when everything el is equal.Many of the previous works have made ad-hoc decisions for using and combining the cues.
In this work,we learn a classification model for gmen-tation from the Gestalt cues.A databa of human marked gmentations has been established[10].We u the hu-man gmented images in this databa as positive exam-ples.For negative examples,we randomly match a human gmentation to a different image,an example of which has been given in Figure1(c).
The outline of the paper is as follows.In2we intro-duce a preprocessing stage which organizes an image into
“superpixels”.In3,we define a t of features for g-ments,including Gestalt cues of contour,texture,bright-ness and good continuation.The features are evaluated using information-theoretic measures.From the features we train a logistic regression classifier.Bad on this model for gments,in4we formulate gmentation as an op-timization problem of a linear objective function over the space of gmentations.To demonstrate the power of our classification model,we d
esign a simple algorithm to ran-domly arch for good gmentations.The experimental results are shown in5.6discuss related works and concludes the paper.
2.Overgmentation as Preprocessing
In this ction we prent a preprocessing stage to group pixels into“superpixels”.The motivations of this prelimi-nary grouping are:(1)pixels are not natural entities;they are merely a conquence of the discrete reprentation of images;and(2)the number of pixels is high even at moder-ate resolutions;this makes optimization on the level of pix-els intractable.We would like to work with“superpixels’which are local,coherent,and which prerve most of the structure necessary for gmentation at the scale of interest.
We apply the Normalized Cuts algorithm[22,8]to pro-duce the superpixel map.Both contour and texture cues are ud.The affinity matrix has local connections only. Figure2shows an example of the overgmentation with the number of superpixels.We obrve from this example that the superpixels are roughly homogeneous in size and shape;this fact simplifies the computation in later stages.Some structures in the human gmentation are lost, but they are usually minor details,much smaller in scale than the objects we are interested in.The reconstructed g-mentation is a good approximation of the original one.
sack
To quantify the quality of this approximation,we u a contour-bad measure to verify the superpixel maps against the human gmentations.In particular,we com-pute the percentage of the human marked boundaries being “covered by”(within afixed small distance of)the super-pixel boundaries.This is the recall rate ud in[9].Figure3 shows the results for images of size-by-.As ex-pected,the recall rates increa with.In our experiments we have found that is sufficient.
3.What is a Good Segment?
A gmentation is a collection of gments.To answer the question“What is a good gmentation?”,we need to answer“What is a good gment?”first.In this ction, we will define a t of features for gments,evaluate the ufulness of the features,and train a classifier from
them.
(a)
(b)
(c)(d)
Figure2.An example of superpixel maps.(a)is the
original image;(b)is a human marked gmentation;
(c)is a superpixel map with;(d)shows aboa constrictor
高中化学教科书reconstruction of the human gmentation from the
superpixels:we assign each superpixel to a gment
in(b)with the maximum overlapping area and extract
the superpixel boundaries.
3.1.Features for grouping
For static images,the classical Gestalt principles of grouping include proximity,similarity,good continu
ation (curvilinear continuity),closure as well as symmetry and parallelism.In our model,for a gment we define the following features:
1.inter-region texture similarity;
2.intra-region texture similarity;
3.inter-region brightness similarity;
4.intra-region brightness similarity;
5.inter-region contour energy;
6.intra-region contour energy;
7.curvilinear continuity.
Texture Similarityfuying
For texture cues we follow the discriminative framework of texton ,[8]).The image isfirst convolved with a bank offilters of multiple orientations.Bad on a vector quantization of thefilter outp
uts,the pixels are clus-tered into a number of texton channels.This gives us a de-scriptor for each region,namely the distribution of textons inside its support.The texture difference of two regions is then measured as the distance between two histograms. We make one modification here in thefiltering stage:when we apply thefilterbank to the image,we restrict the support of thefilters to be within a single superpixel(normalized ,[6]).
In the next step,we convert the distance into a log likelihood ratio:let same denote the t of all superpixels
Figure3.The percentage of human marked bound-
aries covered by the superpixel maps.The number of
superpixels varies from to.Distance toler-
ance is t at,and pixels respectively,for images
of size-by-.For and a tolerance of
pixels,approximately of the human marked
boundaries are covered.
pairs such that they appear in the same gment of a human gmentation,and let diff denote the t of all su-perpixel pairs such that they appear in different gments in a human gmentation.We compute the distance for all pairs of superpixels in same,and denote the distribution of the distances as same.Similarly we collect diff, the distribution of distance on the t diff(e Figure4 for the empirical distributions).Let be the dis-tance between the texture histogram of a superpixel and a gment.The texture similarity between and is de-fined as:
same
diff
(a)(b)
Figure4.The empirical distributions of distance
of texture histograms.(a)is between a pair of
superpixels in the same human marked gment;
is between a pair of superpixels in different gments. (b)shows the log likelihood ratio.
The log likelihood ratio measures the significance of the value.We u this basic texture similarity mea-sure to define two texture features for a gment ,intra-region texture similarity and inter-region texture similarity.Figure(a)illustrates the definition.The intra-region texture similarity sums over all the superpixels in the region:
and the inter-region similarity sums over all the superpixels on,the boundary superpixels of:
where is the gment adjacent to.If there are mul-tiple gments adjacent,we take the average of similarity values.
S
S’q
(a)(b)
Figure5.(a)The intra-region similarity compares the
descriptor of a superpixel to the gment con-
taining it.The inter-region similarity compares the
descriptor of a superpixel on the boundary of to
the adjacent gment.(b)Curvilinear continuity
of is measured by the tangent changes at superpixel
junctions along the boundary of.
Brightness similarity
The intra-region brightness similarity and inter-region brightness similarity are defined in an iden-tical way.The brightness descriptor for each region is a his-togram of brightness values.We compute the distance of histograms and u empirical data to convert the distance into a log likelihood ratio.This
basic similarity measure is incorporated into the intra-and inter-region similarity cues.
Contour energy
Contour cues are computed at the level of pixels.Wefirst compute the orientation energy[12,8]at each pixel. The orientation energy is converted to a soft”contourness”, ,by a non-linear transform[8,18].The inter-region contour energy is the summation of over all the pixels on the boundary of,and the intra-region contour energy is the summation of over all the pixels on the superpixel boundaries inside.
Good continuation
Curvilinear continuity is measured as follows:for each ad-jacent pair of superpixels and on the boundary of a g-
ment,there is a change of tangent at the junction(e Figure(b)).This measures thefirst-order smoothness of the boundary:the larger this angle,the less smooth
the boundary of.From the boundaries in the human g-mentations,we collect the distribution tangent of tangent changes.Let be the t of all superpixel junctions on
the boundary of,the curvilinear continuity is de-fined as
tangent
Normalizing the features
The features we have defined are unnormalized and can-not be directly compared with one another.To normalize them,we notice that all the features we have defined are summations of basic quantities.For example,consider the intra-region texture similarity. We assume that the’s are random variables with the same mean and variance for all pairs such that.If there are superpixels in,we normalize as.The maximum likeli-hood estimates of and are ud.Other features are nor-malized in the same way.
3.2.Power of the Gestalt cues
Before we train a classifier of gments from the fea-tures above,one interesting question is to ask how uful the grouping cues are.We conduct an information-theoretic analysis to measure the power of the cues in a model-and algorithm-independent way.
Each gment is associated with a t of features
and a class label:if is a gment from a good gmen-tation,;if is from a bad gmentation,. From the datats we collect the joint distribution of and the features.For any feature,we compute the mutual information.This is the amount of information contained in about the classification.The distributions are normalized and the marginal entropy of is(bits). Thefirst column of Table1(a)shows the results for individ-ual features.We also combine each pair of inter-and intra-features together to evaluate the overall power of contour, texture,and brightness cues.The results are listed in the first column of Table1(b).
From this analysis of mutual information wefind that the prence of boundary contours is the most informative grouping cue.The texture cues and brightness cues are approximately equally informative.The intra-region cues by themlves are usually non-informative.Combined with inter-region cues,however,they make significant contribu-tions.Curvilinear continuity turns out to be a powerful cue in our analysis.The power of continuity is revealed due to the way we construct the datat of bad gmentations( e the introduction).Becau a randomly assigned g-mentation often disagrees with the superpixel map,which is constructed from the image,the resulting boundaries are jig-jaggy,poor in continuity.
One interesting question to ask is whether the normal-ized convolution bad on the superpixel map
is helpful for grouping.We repeat the texture analysis with stan-dardfi,not making u of the superpixel masks. Information-theoretic analysis shows that the joint informa-tion of inter-region texture and inter-region brightness cues,
,increas from to.This re-sult suggests that a reasonable support mask of image re-gions does help with texture analysis,if texture and bright-ness are ud simultaneously.
3.3.Training the classifier
We have formulated the problem of gmentation as a two-class classification.This is one of the most well-studied problems in the statistical learning and we have a variety of techniques at our disposal.We u a simple logistic regres-sion classifier,which linearly combines the features:
(1)
The higher the value of is,the more likely is a good gment.The weights are easily learned by maximizing the likelihood on training data with the standard iterative reweighted least squares algorithm[4].gments are ud as training data and gments as test data. The initialization is random and the convergence is fast.For intra-region features,the weights are negative.北京o培训
To gain more insights into this combination of features, we examine the empirical distributions more cloly.We collect the joint distributions of a pair of features,both on the positive examples and the negative examples,to e if they are linearly parable.Figure6shows contour plots of two examples of the empirical density functions.We have found that the normalized features are roughly Gaussian distributed and a linear classifierfits the data well.(The authors of[9]also reported that for vision data of low di-mension and poor parability,logistic regression performs as well as other sophisticated techniques.)
One way to evaluate our model is to look at thefi-nal gmentation results,which we prent in Section5. Information-theoretic analysis again provides us an alterna-tive way of evaluation.Ideally,we would like our model to capture all the information contained in the grouping cues.
Figure6.Iso-probability contour plots of empirical
distributions for a pair of features.The plots suggest
that:(1)the normalized features are well-behaved;for
both class a Gaussian model would be a reasonable
approximation.And(2)a linear classifier would per-
form well.
That is,the label and the features would be condition-ally independent given the output of the model.The residual information is measured by the mutual information of and conditioned on.The results have been listed in the cond columns of Table1.We obrve that there is little residual information left in the features,which indi-cates that the linear classifierfits the data well.
To further evaluate our classification model,we u the precision-recall framework[19,2].Precision is the fraction of detections which are true positives.Recall is the frac-tion of true positives which are detected.Figure7shows the precision-recall curves for three cas;the results are al-most identical.This suggests that for this problem(1)the logistic regression model generalizes well;and(2)sophisti-cated classification techniques may not outperform the sim-ple linear model.
Figure7.Precision-recall analysis for(1)simple lo-
gistic classifier on training data;(2)simple logistic
classifier on test data;and(3)boosted logistic classi-
fier on test data.The three curves are almost identical.
cet6准考证打印入口官网
4.Finding Good Segmentations
“What is a good gmentation?”We make the simplify-ing assumption that the“goodness”of the gments in a
Feature Information Residual Info.
Contour:inter-0.3870.010
intra-0.0120.010 Texture:inter-0.1370.005
intra-0.0300.008 Brightness:inter-0.1120.005
intra-0.0490.007 Continuity:0.1980.002
(a)生活大爆炸第六季7
Combined Feature Information Residual Info.
Contour0.5100.024
Texture0.2200.026
Brightness0.2320.025
(b)
Table1.Information-theoretic analysis of grouping
cues.(a)shows the results for individual features.(b)
shows the results when pairs of intra-and inter-region
cues are combined.Thefirst column is the amount
of information the features contain about the class
label.The cond column is the amount of residual
information the features retain when conditioned on
the model output.The marginal entropy of the class
label is(bits).
gmentation are independent.This leads us to the fol-lowing criterion:
(2) which sums the classifier function in Eqn(1)over the g-ments.The problem offinding the best gmentation be-comes the optimization of in the space of all gmenta-tions.
The objective is simple in form but the arch space of all gmentations is large.Following the Markov Chain Monte Carlo paradigm[3,24],we adopt a simple strategy of random arch bad on simulated annealing.
The dynamics in this random arch involves three basic moves:(1)shift:a superpixel is shifted from its gment to an adjacent gment;(2)merge:two adjacent gments are merged into one;and(3)split:a gment is split into two. Thefirst two moves are straightforward.For splitting a g-ment,we u a simple method by clustering the superpixels in the gment bad on location and mean intensity.This clustering is also ud to initialize the arch.
At each step,the algorithm randomly picks one of the moves above and construct a new gmentation.If
,we accept the move.Otherwi,we accept with probability where is the tempera-ture,decreasing linearly over time.
The algorithm is naive;nevertheless it demonstrates the power of our classification model.There exist many other possible ways of exploring the space of gmentations.
5.Experimental Results
Figure9shows some results of our algorithm on images from the Corel Imageba.The images are all gray-scale and of size-by-.In our current implementation,the random arch itlf takes about to minutes on a Pen-tium III Hz processor.
We have found that the gmentations are biad toward small regions.One reason is that our objective function is a simple sum over individual gments and is not he number of gments.To gment an im-age into approximately equally sized regions,and also to provide a degree of ur control over the scale of the g-mentation,we introduce a prior distribution on gment size .Figure8shows the empirical distribution of in the human gmented images.We approximate this prior with a log-normal distribution,and the objective function
becomes:
(3)
Figure8.The empirical distribution of(the logarithm
of)gment size.Both extremely small and large
gments are uncommon.We approximate this prior
on with a log-normal distribution.
Our naive algorithm does occasionally suffer from the problem of local maxima.One reason lies in the asymmetry of the simple dynamics:merging is trivial but splitting is hard.To obtain thefinal results we prent in Figure10,the algorithm is allowed to run for three times and the solution with the best value is picked.A weak prior with (superpixels)and is ud.6.Discussions
In this paper we have prented a discriminative frame-work for gmentation as the classification of
“good”g-mentations and“bad”gmentations.The Gestalt grouping cues are combined in a principled way and we have empir-ically measured the power of the cues.A linear classi-fier and a simple random arch algorithm have produced promising results on a variety of natural images.
This work is motivated in part by both the Normalized Cuts[22,8]and the DDMCMC work[24].Our approach is in the discriminative paradigm as the Normalized Cuts. The basic contour and texture cues in our model is similar to tho in[8].However,our approach differs from[22] in two important aspects.First,we have constructed our model on the level of gments.This has enabled us to(1) define relationships between the parts and the whole;(2) to easily incorporate mid-level cues such as good continua-tion,and,(3)instead of relying on intervening contours[7], we u the contour cues in a straightforward way.Second, we have formulated grouping as a two-class classification problem.This framework connects the gmentation as a computational problem with the ecological statistics of nat-ural images[9,2]and the rich theory of learning.
The Normalized Cuts criterion is driven by computa-tional convenience,and does not have a clear statistical in-terpretation.The random walk formulation of Normalized Cuts[11]defines gmentation as a one-class problem and has only been applied to special class of images.The Normalized Cuts criterion does lead to a computationally tractable problem of spectral clustering.Our framework,
on the other hand,has to solve a difficult optimization in the space of all gmentations.
The DDMCMC[24]is a generative approach which builds explicit models of image regions.The DDMCMC framework faces a computational challenge similar to ours. The main difference in philosophy is -erative.Solving an easier problem of discrimination,we are able to succeed with a linear classifier and a naive arch al-gorithm.As we have found out,boundary contour is the most informative grouping cue,and it is in esnce discrim-inative.Such contour cues are ud indirectly in[24]. Acknowledgments.This rearch was supported by NSF through a Digital Library Grant IRI-9411334.
References
[1]  E.Borenstein and S.Ullman.Class-specific,top-down g-
topdogmentation.In ECCV’02,volume2,pages109–124,2002.
[2]  C.Fowlkes,D.Martin,and J.Malik.Learning affinity func-
tions for image gmentation:combining patch-bad and gradient-bad approaches.In CVPR’03,volume2,pages 54–61,2003.

本文发布于:2023-07-13 19:41:13,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/176408.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:打印   高考   培训   生活
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图