A survey on transfer learning

更新时间:2023-06-25 00:58:02 阅读: 评论:0

1
A Survey on Transfer Learning
Sinno Jialin Pan and Qiang Y ang Fellow,IEEE
yolo i guess lol什么意思Abstract—A major assumption in many machine learning and data mining algorithms is that the training and future data must be
in the same feature space and have the same distribution.However,in many real-world applications,this assumption may not hold.
For example,we sometimes have a classification task in one domain of interest,but we only have sufficient training data in another domain of interest,where the latter data may be in a different feature space or follow a different data distribution.In such cas, knowledge transfer,if done successfully,would greatly improve the performance of learning by avoiding much expensive data labeling efforts.In recent years,transfer learning has emerged as a new learning framework to address this problem.This survey focus on categorizing and reviewing the current progress on transfer learning for classification,regression and clustering problems.In this survey,
we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation,multi-
task learning and sample lection bias,as well as co-variate shift.We also explore some potential future issues in transfer learning rearch.
Index Terms—Transfer Learning,Survey,Machine Learning,Data Mining.
1I NTRODUCTION
Data mining and machine learning technologies have already achieved significant success in many knowledge engineering areas including classification,regression and , [1],[2]).However,many machine learning methods work well only under a common assumption:the training and test data are drawn from the same feature space and the same distribution. When the distribution changes,most statistical models need to be rebuilt from scratch using newly collected training data.In many real world applications,it is expensive or impossible to re-collect the needed training data and rebuild the models.It would be nice to reduce the need and effort to re-collect the tr
aining data.In such cas,knowledge transfer or transfer learning between task domains would be desirable.
Many examples in knowledge engineering can be found where transfer learning can truly be beneficial.One example is Web document classification[3],[4],[5],where our goal is to classify a given Web document into veral predefined categories.As an example in the area of Web-document classification(,[6]),the labeled examples may be the university Web pages that are associated with category information obtained through previous manual-labeling efforts. For a classification task on a newly created Web site where the data features or data distributions may be different,there may be a lack of labeled training data.As a result,we may not be able to directly apply the Web-page classifiers learned on the university Web site to the new Web site.In such cas,it would be helpful if we could transfer the classification knowledge into the new domain.
The need for transfer learning may ari when the data can be easily outdated.In this ca,the labeled data obtained in one time period may not follow the same distribution in a later time period.For example,in indoor WiFi localization Department of Computer Science and Engineering,Hong Kong University of Science and Technology,Clearwater Bay,Kowloon,Hong Kong
Emails:{sinnopan,qyang}@c.ust.hk problems,which aims to detect a ur’s current location bad on previously collected WiFi data,it is very expensive to calibrate WiFi data for building localization models in a large-scale environment,becau a ur needs to label a large collection of WiFi signal data at each location.However,the WiFi signal-strength values may be a function of time,device or other dynamic factors.A model trained in one time period or on one device may cau the performance for location estimation in another time period or on another device to be reduced.To reduce the re-calibration effort,we might wish to adapt the localization model trained in one time period(the source domain)for a new time period(the target domain),or to adapt the localization model trained on a mobile device(the source domain)for a new mobile device(the target domain), as done in[7].
As a third example,consider the problem of ntiment classification,where our task is to automatically classify the reviews on a product,such as a brand of camera,into positive and negative views.For this classification task,we need to first collect many reviews of the product and annotate them. We would then train a classifier on the reviews with their corresponding labels.Since the distribution of review data among different types of products can be very different,to maintain good classification performance,we need to collect a large amount of labeled data in order to train the revi
ew-classification models for each product.However,this data-labeling process can be very expensive to do.To reduce the effort for annotating reviews for various products,we may want to adapt a classification model that is trained on some products to help learn classification models for some other products.In such cas,transfer learning can save a significant amount of labeling effort[8].
In this survey article,we give a comprehensive overview of transfer learning for classification,regression and clustering developed in machine learning and data mining areas.There has been a large amount of work on transfer learning for reinforcement learning in the machine learning ,
Digital Object Indentifier 10.1109/TKDE.2009.1911041-4347/$25.00 ©  2009 IEEE
2
[9],[10]).However,in this paper,we only focus on transfer
learning for classification,regression and clustering problems
that are related more cloly to data mining tasks.By doing
the survey,we hope to provide a uful resource for the data
mining and machine learning community.
The rest of the survey is organized as follows.In the next
four ctions,wefirst give a general overview and define
some notations we will u later.We then briefly survey the
history of transfer learning,give a unified definition of transfer
learning and categorize transfer learning into three different
ttings(given in Table2and Figure2).For each tting,we
review different approaches,given in Table3in detail.Afterdps是什么意思
that,in Section6,we review some current rearch on the
topic of“negative transfer”,which happens when knowledge
transfer has a negative impact on target learning.In Section7,
we introduce some successful applications of transfer learning
and list some published data ts and software toolkits for
transfer learning rearch.Finally,we conclude the article with
台湾飞机失事a discussion of future works in Section8.
2O VERVIEW
2.1A Brief History of Transfer Learning
Traditional data mining and machine learning algorithms make
predictions on the future data using statistical models that are
trained on previously collected labeled or unlabeled training
data[11],[12],[13].Semi-supervid classification[14],[15],
[16],[17]address the problem that the labeled data may
be too few to build a good classifier,by making u of a
large amount of unlabeled data and a small amount of labeled
data.Variations of supervid and mi-supervid learning
for imperfect datats have been studied;for example,Zhu
and Wu[18]have studied how to deal with the noisy class-
label problems.Yang sidered cost-nsitive learning
[19]when additional tests can be made to future samples.
Nevertheless,most of them assume that the distributions of
the labeled and unlabeled data are the same.Transfer learning,
in contrast,allows the domains,tasks,and distributions ud
in training and testing to be different.In the real world,we
denizen
obrve many examples of transfer learning.For example,
we mayfind that learning to recognize apples might help to
recognize pears.Similarly,learning to play the electronic organ
zoticmay help facilitate learning the piano.The study of Transfer
learning is motivated by the fact that people can intelligently
apply knowledge learned previously to solve new problems
faster or with better solutions.The fundamental motivation
for Transfer learning in thefield of machine learning was
discusd in a NIPS-95workshop on“Learning to Learn”1,which focud on the need for lifelong machine-learning methods that retain and reu previously learned knowledge. Rearch on transfer learning has attracted more and
more attention since1995in different names:learning to
learn,life-long learning,knowledge transfer,inductive trans-
fer,multi-task learning,knowledge consolidation,context-
nsitive learning,knowledge-bad inductive bias,meta learn-
ing,and incremental/cumulative learning[20].Among the,
1.socrates.acadiau.ca/cours/comp/dsilver/NIPS95LTL/ transfer.workshop.1995.html a cloly related learning technique to transfer learning is the multi-task learning framework[21],which tries to learn multiple tasks simultaneously even when they are different.
A typical approach for multi-task learning is to uncover the common(latent)features that can benefit each individual task. In2005,the Broad Agency Announcement(BAA)05-29 of Defen Advanced Rearch Projects Agency(DARPA)’s Information Processing Technology Office(IPTO)2gave a new mission of transfer learning:the ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks.In this definition,transfer learning aims to extract the knowledge from one or more source tasks and applies the knowledge to a target task.In contrast to multi-task learning,rather than learning all of the source and target tasks simultaneously,transfer lear
ning cares most about the target task.The roles of the source and target tasks are no longer symmetric in transfer learning.
Figure1shows the difference between the learning process of traditional and transfer learning techniques.As we can e,traditional machine learning techniques try to learn each task from scratch,while transfer learning techniques try to transfer the knowledge from some previous tasks to a target task when the latter has fewer high-quality training
data.
(a)Traditional Machine
Learning(b)Transfer Learning
Fig.1.Different Learning Process between Traditional Machine Learning and Transfer Learning
Today,transfer learning methods appear in veral top venues,most notably in data mining(ACM KDD,IEEE ICDM and PKDD,for example),machine learning(ICML,NIPS, ECML,AAAI and IJCAI,for example)and applications of machine learning and data mining(ACM SIGIR,WWW and ACL for example)3.Before we give different categorizations of transfer learning,wefirst describe the notations ud in this article.
2.2Notations and Definitions
In this ction,we introduce some notations and definitions that are ud in this survey.First of all,we give the definitions of a“domain”and a“task”,respectively.
In this survey,a domain D consists of two components:a feature space X and a marginal probability distribution P(X), where X={x1,...,x n}∈X.For example,if our learning
2.www.darpa.mil/ipto/programs/tl/tl.asp
3.We summarize a list of conferences and workshops where transfer learning papers appear in the few years in the following webpage for ust.hk/∼sinnopan/conferenceTL.htm
task is document classification,and each term is taken as a binary feature,then X is the space of all term vectors,x i is the i th term vector corresponding to some documents,and
X is a particular learning sample.In general,if two domains are different,then they may have different feature spaces or different marginal probability distributions.
Given a specific domain,D={X,P(X)},a task consists of two components:a label space Y and an objective predictive function f(·)(denoted by T={Y,f(·)}),which is not obrved but can be learned from the training data,which consist of pairs{x i,y i},where x i∈X and y i∈Y.The function f(·)can be ud to predict the corresponding label,
f(x),of a new instance x.From a probabilistic viewpoint, f(x)can be written as P(y|x).In our document classification example,Y is the t of all labels,which is True,Fal for a binary classification task,and y i is“True”or“Fal”.
For simplicity,in this survey,we only consider the ca where there is one source domain D S,and one target domain, D T,as this is by far the most popular of the rearch works in the literature.More specifically,we denote the source domain
data as D S={(x S
1,y S
1
),...,(x S
n S
,y S
n S
vivid
)},where x Su的用法
i
X S is the data instance and y S
i ∈Y S is the corresponding
class label.In our document classification example,D S can be a t of term vectors together with their associated true or fal class labels.Similarly,we denote the target domain data
as D T={(x T
1,y T
1
),...,(x T
n T
,y T
n T
)},where the input x T
i
is in X T and y T
i ∈Y T is the corresponding output.In most
cas,0≤n T n S.
We now give a unified definition of transfer learning.
Definition1(Transfer Learning)Given a source domain D S and learning task T S,a target domain D T and learning task T T,transfer learning aims to help improve the learning of the target predictive function f T(·)in D T using the knowledge in D S and T S,where D S=D T,or T S=T T.
In the above definition,a domain is a pair D={X,P(X)}. Thus the condition D S=D T implies that either
X S=X T or P S(X)=P T(X).For example,in our document classification example,this means that between a source document t and a target document t,either the term features are different between the two ,they u different languages),or their marginal distributions are different.
Similarly,a task is defined as a pair T={Y,P(Y|X)}. Thus the condition T S=T T implies that either Y S=Y T or P(Y S|X S)=P(Y T|X T).When the target and source domains are the D S=D T,and their learning tasks are the ,T S=T T,the learning problem becomes a traditional machine learning problem.When the domains are different,then either(1)the feature spaces between the domains are X S=X T,or(2)the feature spaces between the domains are the same but the marginal probability distributions between domain data are different;
i ∈X S and X T
i
∈X T.
As an example,in our document classification example,ca (1)corresponds to when the two ts of
documents are described in different languages,and ca(2)may correspond to when the source domain documents and the target domain documents focus on different topics.
Given specific domains D S and D T,when the learning
tasks T S and T T are different,then either(1)the label spaces
between the domains are Y S=Y T,or(2)the conditional probability distributions between the domains are
P(Y S|X S)=P(Y T|X T),where Y S
i
∈Y S and
Y T
nancy grace
i
∈Y T.In our document classification example,ca(1) corresponds to the situation where source dom
ain has binary document class,whereas the target domain has ten class to classify the documents to.Ca(2)corresponds to the situation where the source and target documents are very unbalanced in terms of the ur-defined class.
In addition,when there exists some relationship,explicit or implicit,between the feature spaces of the two domains,we say that the source and target domains are related.
2.3A Categorization of Transfer Learning Tech-niques
In transfer learning,we have the following three main rearch issues:(1)What to transfer;(2)How to transfer;(3)When to transfer.
“What to transfer”asks which part of knowledge can be transferred across domains or tasks.Some knowledge is specific for individual domains or tasks,and some knowledge may be common between different domains such that they may help improve performance for the target domain or task.After discovering which knowledge can be transferred,learning algorithms need to be developed to transfer the knowledge, which corresponds to the“how to transfer”issue.
“When to transfer”asks in which situations,transferring skills should be done.Likewi,we are interest
ed in knowing in which situations,knowledge should not be transferred.In some situations,when the source domain and target domain are not related to each other,brute-force transfer may be unsuc-cessful.In the worst ca,it may even hurt the performance of learning in the target domain,a situation which is often referred to as negative transfer.Most current work on transfer learning focus on“What to transfer”and“How to transfer”, by implicitly assuming that the source and target domains be related to each other.However,how to avoid negative transfer is an important open issue that is attracting more and more attention in the future.
Bad on the definition of transfer learning,we summarize the relationship between traditional machine learning and var-ious transfer learning ttings in Table1,where we categorize transfer learning under three sub-ttings,inductive trans-fer learning,transductive transfer learning and unsupervid transfer learning,bad on different situations between the source and target domains and tasks.
1)In the inductive transfer learning tting,the target task
is different from the source task,no matter when the source and target domains are the same or not.
In this ca,some labeled data in the target domain are required to induce an objective predictive mo
del f T(·) for u in the target domain.In addition,according to different situations of labeled and unlabeled data in the source domain,we can further categorize the inductive transfer learning tting into two cas:butcher怎么读
TABLE1
Relationship between Traditional Machine Learning and Various Transfer Learning Settings Learning Settings Source and Target Domains Source and Target Tasks Traditional Machine Learning the same the same Inductive Transfer Learning/the same different but related Transfer Learning Unsupervid Transfer Learning different but related different but related Transductive Transfer Learning different but related the same
(1.1)A lot of labeled data in the source domain are
available.In this ca,the inductive transfer learning tting is similar to the multi-task learning tting.
However,the inductive transfer learning tting only aims at achieving high performance in the target task by transferring knowledge from the source task while multi-task learning tries to learn the target and source task simultaneously.
(1.2)No labeled data in the source domain are available.
In this ca,the inductive transfer learning tting is similar to the lf-taught learning tting,which is first propod by Raina et al.[22].In the lf-taught learning tting,the label spaces between the source and target domains may be different,which implies the side information of the source domain cannot be ud directly.Thus,it’s similar to the inductive transfer learning tting where the labeled data in the source domain are unavailable.
2)In the transductive transfer learning tting,the source
and target tasks are the same,while the source and target domains are different.
In this situation,no labeled data in the target domain are available while a lot of labeled data in the source domain are available.In addition,according to different situations between the source and target domains,we can further categorize the transductive transfer learning tting into two cas.
(2.1)The feature spaces between the source and target
domains are different,X S=X T.
(2.2)The feature spaces between domains are the same,
X S=X T,but the marginal probability distributions of the input data are different,P(X S)=P(X T).
The latter ca of the transductive transfer learning tting is related to domain adaptation for knowledge transfer in text classification[23]and sample lection bias[24]or co-variate shift[25],who assumptions are similar.
3)Finally,in the unsupervid transfer learning tting,
similar to inductive transfer learning tting,the target task is different from but related to the source task.
However,the unsupervid transfer learning focus on solving unsupervid learning tasks in the target domain, such as clustering,dimensionality reduction and density estimation[26],[27].In this ca,there are no labeled data available in both source and target domains in training.
The relationship between the different ttings of transfer learning and the related areas are summarized in Table2and Figure2.
Approaches to transfer learning in the above three different ttings can be summarized into four cas bad on“What to transfer”.Table3shows the four cas and brief description. Thefirst conte
xt can be referred to as instance-bad transfer-learning(or instance-transfer)approach[6],[28],[29],[30], [31],[24],[32],[33],[34],[35],which assumes that certain parts of the data in the source domain can be reud for learning in the target domain by re-weighting.Instance re-weighting and importance sampling are two major techniques in this context.
A cond ca can be referred to as feature-reprentation-transfer approach[22],[36],[37],[38],[39],[8],[40],[41], [42],[43],[44].The intuitive idea behind this ca is to learn a“good”feature reprentation for the target domain.In this ca,the knowledge ud to transfer across domains is encoded into the learned feature reprentation.With the new feature reprentation,the performance of the target task is expected to improve significantly.
A third ca can be referred to as parameter-transfer ap-proach[45],[46],[47],[48],[49],which assumes that the source tasks and the target tasks share some parameters or prior distributions of the hyper-parameters of the models.The transferred knowledge is encoded into the shared parameters or priors.Thus,by discovering the shared parameters or priors, knowledge can be transferred across tasks.
Finally,the last ca can be referred to as the relational-knowledge-transfer problem[50],which deals 第一名英文
with transfer learning for relational domains.The basic assumption behind this context is that some relationship among the data in the source and target domains are similar.Thus,the knowledge to be transferred is the relationship among the data.Recently, statistical relational learning techniques dominate this context [51],[52].
Table4shows the cas where the different approaches are ud for each transfer learning tting.We can e that the inductive transfer learning tting has been studied in many rearch works,while the unsupervid transfer learning tting is a relatively new rearch topic and only studied in the context of the feature-reprentation-transfer ca.In addition,the feature-reprentation-transfer problem has been propod to all three ttings of transfer learning.However, the parameter-transfer and the relational-knowledge-transfer approach are only studied in the inductive transfer learning tting,which we discuss in detail below.
3I NDUCTIVE T RANSFER L EARNING
Definition2(Inductive Transfer Learning)Given a source domain D S and a learning task T S,a target domain D T and
TABLE2
Different Settings of Transfer Learning
Transfer Learning Settings Related Areas Source Domain Labels Target Domain Labels Tasks Inductive Transfer Learning Multi-task Learning Available Available Regression,
Classification
Self-taught Learning Unavailable Available Regression,
Classification
Transductive Transfer Learning Domain Adaptation,Sample
Selection Bias,Co-variate Shift Available Unavailable Regression,
Classification
Unsupervid Transfer Learning Unavailable Unavailable Clustering,
Dimensionality
Reduction
Fig.2.An Overview of Different Settings of Transfer
TABLE3
Different Approaches to Transfer Learning
Transfer Learning Approaches Brief Description
Instance-transfer To re-weight some labeled data in the source domain for u in the target domain[6],[28],[29],
[30],[31],[24],[32],[33],[34],[35].
Feature-reprentation-transfer Find a“good”feature reprentation that reduces difference between the source and the target
domains and the error of classification and regression models[22],[36],[37],[38],[39],[8],
[40],[41],[42],[43],[44].
Parameter-transfer Discover shared parameters or priors between the source domain and target domain models,which
can benefit for transfer learning[45],[46],[47],[48],[49].
Relational-knowledge-transfer Build mapping of relational knowledge between the source domain and the target domains.Both
domains are relational domains and i.i.d assumption is relaxed in each domain[50],[51],[52].
TABLE4
Different Approaches Ud in Different Settings
Inductive Transfer Learning Transductive Transfer Learning Unsupervid Transfer Learning Instance-transfer
√√
Feature-reprentation-transfer
√√√
Parameter-transfer
Relational-knowledge-transfer

本文发布于:2023-06-25 00:58:02,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/156600.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:飞机   失事   台湾
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图