首页 > 英文翻译

Learning Visual Reprentations using Images with Captions

更新时间:2023-06-25 00:31:58 阅读：评论：0

Ariadna Quattoni Michael Collins Trevor Darrell

MIT Computer Science and Artiﬁcial Intelligence Laboratory

Cambridge MA02139

ariadna,mcollins,trevor@csail.mit.edu

January22,2007

1Overview中秋节英语翻译

Current methods for learning visual categories work well when a large amount of labeled data is available,but can run into vere difﬁculties when the number of labeled examples is small.When labeled data is scarce it may be beneﬁcial to u unlabeled data to learn an image reprentation that is low-dimensional,but nevertheless captures the information required to discriminate between image categories.We describe a method for learning reprentations from large quantities of unlabeled images which have associated captions;the aim is to learn a reprentation that aids learning in image

classiﬁcation problems.Experiments show that the method signiﬁcantly outperforms a fully-supervid baline model as well as a model that ignores the captions and learns a visual reprentation by performing PCA on the unlabeled images alone.Our current work concentrates on captions as the source of meta-data,but more generally other types of meta-data could be ,video quences with accompanying speech).

andru

2Background

When few labeled examples are available most current supervid learning methods[9,3,4,7,5]for image classiﬁcation may work poorly–for example when a ur deﬁnes a new category and provides only a few labeled examples.To reach human performance,it is clear that knowledge beyond the supervid training data needs to be leveraged.

There is a large literature on mi-supervid learning approaches,where unlabeled data is ud in addition to labeled data.Our work is related to work in multi-task learning,where training data in related tasks is ud to aid learning in the problem of interest.Multi-task learning has a relatively long history in machine learning[8,2,6,1],but has only recently been addresd in machine vision.We build on the structure learning approach of Ando and Zhang[1],who describe an algorithm for transfer lear

ning,and suggest the u of auxiliary problems on unlabeled data as a method for constructing related tasks.In some cas unlabeled data may contain uful meta-data that can be ud to learn a low-dimensional reprentation that reﬂects the mantic content of an image.As one example,large quantities of images with associated natural language captions can be found on the web.

3Approach

We propo to u the meta-data to induce a reprentation that reﬂects an underlying part structure in an existing,high-dimensional visual reprentation.The new reprentation groups together synonymous visual features—features that consistently play a similar role across different image classiﬁcation tasks.Our approach exploits learning from auxiliary problems which can be created from images with associated captions.Each auxiliary problem involves taking an image as input,and predicting whether or not a particular content word(e.g,man,ofﬁcial,or celebrates)is in the caption associated with that image.In structural learning,a parate linear classiﬁer is trained for each of the auxiliary problems;manifold ,SVD)is then applied to the resulting t of parameter vectors,in esnceﬁnding a low-dimensional space which is a good approximation to the space of possible parameter vectors.If features in the high-dimensional space correspond to the sam

e mantic part,their associated classiﬁer parameters(weights)across different auxiliary problems may be correlated in such a way that the basis functions learned by the SVD step collap the features to a single feature in a new,low-dimensional feature-vector reprentation.

Topic:visual processing and pattern recognition.

全新版大学英语综合教程3Preference:oral/poster.(Ariadna Quattoni)

青色的英文# positive training examples

Ejobinterview

英语翻译服务r

莱佛士学院o

Figure1:Equal error rates averaged across topics with standard deviation calculated for ten runs for each topic(left).Example images from the Figure Skating,Ice Hockey,and Golden Globes(right).

4Experiments

In aﬁrst t of experiments,we u synthetic data examples to illustrate how the method can uncover latent part structures.

A cond t of experiments involves classiﬁcation of news images into different topics.Images on the Reuters website are partitioned into stories which correspond to different topics in the news;each image has a topic label as well as associated caption meta-data.For both experiments we compare a baline model that us a bag-of-words SIFT reprentation of image data,to our method,which rep

laces the SIFT reprentation with a new reprentation that is learned from images with associated captions.In addition,we compare our method to a baline model that ignores the meta-data and learns a new visual reprentation by performing PCA on the unlabeled images.Note that our goal is to build classiﬁers that work on images ,images which do not have captions),and our experimental t-up reﬂects this,in that training and test examples for the topic classiﬁcation tasks include image data only.The experiments show that our method signiﬁcantly outperforms both baline models.See people.csail.mit.edu/ariadna/TransferLearning for further details on the method and the experiments.

5Summary

cabinet是什么意思We have described a method for learning visual reprentations from large quantities of unlabeled images which have associated captions.The method makes u of auxiliary training ts corresponding to different words in the captions, and structural learning,which learns a manifold in parameter space.The induced reprentations signiﬁcantly speed up learning of image classiﬁers applied to topic classiﬁcation.Our results show that when meta-data labels are suitably related to a target(core)task,the structure learning method can discover feature groupings that speed learning of the target task.Future work includes exploration of automatic determination of relevance between tar

get and auxiliary tasks, and experimental evaluation of the effectiveness of structure learning from more weakly related auxiliary domains. References

[1]A framework for learning predictive structures from multiple tasks and unlabeled data.Journal of Machine Learning Rearch,

6:1817–1853,2005.

[2]J.Baxter.A bayesian/information theoretic model of learning to learn via multiple task sampling.Machine Learning,28:7–39,

1997.

[3]K.Grauman and T.Darrell.The pyramid match kernel:discriminative classiﬁcation with ts of image features.In Proceedings fo

the International Conference on Computer Vision(ICCV),2005.

[4]S.Lazebnik,C.Schmid,and J.Ponce.Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories.

In Proceedings of CVPR-2006,2006.

[5]J.Mutch and D.G.Lowe.Multiclass object recognition with spar,localized features.In Proceedings of CVPR-2006,2006.

[6]R.Raina,A.Y.Ng,and D.Koller.Constructing informative priors using transfer learning.In Proceedings of the23rd International

takethelead

Conference on Machine learning,pages713–720,2006.

[7]T.Serre,L.Wolf,and T.Poggio.Object recognition with features inspired by visual cortex.In Proceedings of2005IEEE Computer

Society Conference on Computer Vision and Pattern Recognition(CVPR2005),2005.

起息[8]S.Thrun.Is learning the n-th thing any easier than learning theﬁrst?In In Advances in Neural Information Processing Systems,

1996.

[9]H.Zhang,A.Berg,M.Maire,and J.Malik.Svm-knn:Discriminative nearest neighbor classiﬁcation for visual category recognition.

In Proceedings of CVPR-2006,2006.

本文发布于:2023-06-25 00:31:58，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/156578.html

上一篇：Method of amplifying nucleic acid and apparatus th

下一篇：全球性威胁和挑战英文作文。

标签：综合莱佛士教程学院大学

留言与评论（共有 0 条评论）