A survey on transfer learning

更新时间:2023-06-25 00:58:02 阅读：评论：0

A Survey on Transfer Learning

Sinno Jialin Pan and Qiang Y ang Fellow,IEEE

yolo i guess lol什么意思Abstract—A major assumption in many machine learning and data mining algorithms is that the training and future data must be

in the same feature space and have the same distribution.However,in many real-world applications,this assumption may not hold.

For example,we sometimes have a classiﬁcation task in one domain of interest,but we only have sufﬁcient training data in another domain of interest,where the latter data may be in a different feature space or follow a different data distribution.In such cas, knowledge transfer,if done successfully,would greatly improve the performance of learning by avoiding much expensive data labeling efforts.In recent years,transfer learning has emerged as a new learning framework to address this problem.This survey focus on categorizing and reviewing the current progress on transfer learning for classiﬁcation,regression and clustering problems.In this survey,

we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation,multi-

task learning and sample lection bias,as well as co-variate shift.We also explore some potential future issues in transfer learning rearch.

Index Terms—Transfer Learning,Survey,Machine Learning,Data Mining.

✦

1I NTRODUCTION

Data mining and machine learning technologies have already achieved signiﬁcant success in many knowledge engineering areas including classiﬁcation,regression and , [1],[2]).However,many machine learning methods work well only under a common assumption:the training and test data are drawn from the same feature space and the same distribution. When the distribution changes,most statistical models need to be rebuilt from scratch using newly collected training data.In many real world applications,it is expensive or impossible to re-collect the needed training data and rebuild the models.It would be nice to reduce the need and effort to re-collect the tr

aining data.In such cas,knowledge transfer or transfer learning between task domains would be desirable.

Many examples in knowledge engineering can be found where transfer learning can truly be beneﬁcial.One example is Web document classiﬁcation[3],[4],[5],where our goal is to classify a given Web document into veral predeﬁned categories.As an example in the area of Web-document classiﬁcation(,[6]),the labeled examples may be the university Web pages that are associated with category information obtained through previous manual-labeling efforts. For a classiﬁcation task on a newly created Web site where the data features or data distributions may be different,there may be a lack of labeled training data.As a result,we may not be able to directly apply the Web-page classiﬁers learned on the university Web site to the new Web site.In such cas,it would be helpful if we could transfer the classiﬁcation knowledge into the new domain.

The need for transfer learning may ari when the data can be easily outdated.In this ca,the labeled data obtained in one time period may not follow the same distribution in a later time period.For example,in indoor WiFi localization Department of Computer Science and Engineering,Hong Kong University of Science and Technology,Clearwater Bay,Kowloon,Hong Kong

Emails:{sinnopan,qyang}@c.ust.hk problems,which aims to detect a ur’s current location bad on previously collected WiFi data,it is very expensive to calibrate WiFi data for building localization models in a large-scale environment,becau a ur needs to label a large collection of WiFi signal data at each location.However,the WiFi signal-strength values may be a function of time,device or other dynamic factors.A model trained in one time period or on one device may cau the performance for location estimation in another time period or on another device to be reduced.To reduce the re-calibration effort,we might wish to adapt the localization model trained in one time period(the source domain)for a new time period(the target domain),or to adapt the localization model trained on a mobile device(the source domain)for a new mobile device(the target domain), as done in[7].

As a third example,consider the problem of ntiment classiﬁcation,where our task is to automatically classify the reviews on a product,such as a brand of camera,into positive and negative views.For this classiﬁcation task,we need to ﬁrst collect many reviews of the product and annotate them. We would then train a classiﬁer on the reviews with their corresponding labels.Since the distribution of review data among different types of products can be very different,to maintain good classiﬁcation performance,we need to collect a large amount of labeled data in order to train the revi

ew-classiﬁcation models for each product.However,this data-labeling process can be very expensive to do.To reduce the effort for annotating reviews for various products,we may want to adapt a classiﬁcation model that is trained on some products to help learn classiﬁcation models for some other products.In such cas,transfer learning can save a signiﬁcant amount of labeling effort[8].

In this survey article,we give a comprehensive overview of transfer learning for classiﬁcation,regression and clustering developed in machine learning and data mining areas.There has been a large amount of work on transfer learning for reinforcement learning in the machine learning ,

[9],[10]).However,in this paper,we only focus on transfer

learning for classiﬁcation,regression and clustering problems

that are related more cloly to data mining tasks.By doing

the survey,we hope to provide a uful resource for the data

mining and machine learning community.

The rest of the survey is organized as follows.In the next

four ctions,weﬁrst give a general overview and deﬁne

some notations we will u later.We then brieﬂy survey the

history of transfer learning,give a uniﬁed deﬁnition of transfer

learning and categorize transfer learning into three different

ttings(given in Table2and Figure2).For each tting,we

review different approaches,given in Table3in detail.Afterdps是什么意思

that,in Section6,we review some current rearch on the

topic of“negative transfer”,which happens when knowledge

transfer has a negative impact on target learning.In Section7,

we introduce some successful applications of transfer learning

and list some published data ts and software toolkits for

transfer learning rearch.Finally,we conclude the article with

台湾飞机失事a discussion of future works in Section8.

2O VERVIEW

2.1A Brief History of Transfer Learning

Traditional data mining and machine learning algorithms make

predictions on the future data using statistical models that are

trained on previously collected labeled or unlabeled training

data[11],[12],[13].Semi-supervid classiﬁcation[14],[15],

[16],[17]address the problem that the labeled data may

be too few to build a good classiﬁer,by making u of a

large amount of unlabeled data and a small amount of labeled

data.Variations of supervid and mi-supervid learning

for imperfect datats have been studied;for example,Zhu

and Wu[18]have studied how to deal with the noisy class-

label problems.Yang sidered cost-nsitive learning

[19]when additional tests can be made to future samples.

Nevertheless,most of them assume that the distributions of

the labeled and unlabeled data are the same.Transfer learning,

in contrast,allows the domains,tasks,and distributions ud

in training and testing to be different.In the real world,we

denizen

obrve many examples of transfer learning.For example,

we mayﬁnd that learning to recognize apples might help to

recognize pears.Similarly,learning to play the electronic organ

zoticmay help facilitate learning the piano.The study of Transfer

learning is motivated by the fact that people can intelligently

apply knowledge learned previously to solve new problems

faster or with better solutions.The fundamental motivation

for Transfer learning in theﬁeld of machine learning was

discusd in a NIPS-95workshop on“Learning to Learn”1,which focud on the need for lifelong machine-learning methods that retain and reu previously learned knowledge. Rearch on transfer learning has attracted more and

more attention since1995in different names:learning to

learn,life-long learning,knowledge transfer,inductive trans-

fer,multi-task learning,knowledge consolidation,context-

nsitive learning,knowledge-bad inductive bias,meta learn-

ing,and incremental/cumulative learning[20].Among the,

1.socrates.acadiau.ca/cours/comp/dsilver/NIPS95LTL/ transfer.workshop.1995.html a cloly related learning technique to transfer learning is the multi-task learning framework[21],which tries to learn multiple tasks simultaneously even when they are different.

A typical approach for multi-task learning is to uncover the common(latent)features that can beneﬁt each individual task. In2005,the Broad Agency Announcement(BAA)05-29 of Defen Advanced Rearch Projects Agency(DARPA)’s Information Processing Technology Ofﬁce(IPTO)2gave a new mission of transfer learning:the ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks.In this deﬁnition,transfer learning aims to extract the knowledge from one or more source tasks and applies the knowledge to a target task.In contrast to multi-task learning,rather than learning all of the source and target tasks simultaneously,transfer lear

ning cares most about the target task.The roles of the source and target tasks are no longer symmetric in transfer learning.

Figure1shows the difference between the learning process of traditional and transfer learning techniques.As we can e,traditional machine learning techniques try to learn each task from scratch,while transfer learning techniques try to transfer the knowledge from some previous tasks to a target task when the latter has fewer high-quality training

data.

(a)Traditional Machine

Learning(b)Transfer Learning

Fig.1.Different Learning Process between Traditional Machine Learning and Transfer Learning

Today,transfer learning methods appear in veral top venues,most notably in data mining(ACM KDD,IEEE ICDM and PKDD,for example),machine learning(ICML,NIPS, ECML,AAAI and IJCAI,for example)and applications of machine learning and data mining(ACM SIGIR,WWW and ACL for example)3.Before we give different categorizations of transfer learning,weﬁrst describe the notations ud in this article.

2.2Notations and Deﬁnitions

In this ction,we introduce some notations and deﬁnitions that are ud in this survey.First of all,we give the deﬁnitions of a“domain”and a“task”,respectively.

In this survey,a domain D consists of two components:a feature space X and a marginal probability distribution P(X), where X={x1,...,x n}∈X.For example,if our learning

2.www.darpa.mil/ipto/programs/tl/tl.asp

3.We summarize a list of conferences and workshops where transfer learning papers appear in the few years in the following webpage for ust.hk/∼sinnopan/conferenceTL.htm

task is document classiﬁcation,and each term is taken as a binary feature,then X is the space of all term vectors,x i is the i th term vector corresponding to some documents,and

X is a particular learning sample.In general,if two domains are different,then they may have different feature spaces or different marginal probability distributions.

Given a speciﬁc domain,D={X,P(X)},a task consists of two components:a label space Y and an objective predictive function f(·)(denoted by T={Y,f(·)}),which is not obrved but can be learned from the training data,which consist of pairs{x i,y i},where x i∈X and y i∈Y.The function f(·)can be ud to predict the corresponding label,

f(x),of a new instance x.From a probabilistic viewpoint, f(x)can be written as P(y|x).In our document classiﬁcation example,Y is the t of all labels,which is True,Fal for a binary classiﬁcation task,and y i is“True”or“Fal”.

For simplicity,in this survey,we only consider the ca where there is one source domain D S,and one target domain, D T,as this is by far the most popular of the rearch works in the literature.More speciﬁcally,we denote the source domain

data as D S={(x S

1,y S

),...,(x S

n S

,y S

n S

vivid

)},where x Su的用法

∈

X S is the data instance and y S

i ∈Y S is the corresponding

class label.In our document classiﬁcation example,D S can be a t of term vectors together with their associated true or fal class labels.Similarly,we denote the target domain data

as D T={(x T

1,y T

),...,(x T

n T

,y T

n T

)},where the input x T

is in X T and y T

i ∈Y T is the corresponding output.In most

cas,0≤n T n S.

We now give a uniﬁed deﬁnition of transfer learning.

Deﬁnition1(Transfer Learning)Given a source domain D S and learning task T S,a target domain D T and learning task T T,transfer learning aims to help improve the learning of the target predictive function f T(·)in D T using the knowledge in D S and T S,where D S=D T,or T S=T T.

In the above deﬁnition,a domain is a pair D={X,P(X)}. Thus the condition D S=D T implies that either

X S=X T or P S(X)=P T(X).For example,in our document classiﬁcation example,this means that between a source document t and a target document t,either the term features are different between the two ,they u different languages),or their marginal distributions are different.

Similarly,a task is deﬁned as a pair T={Y,P(Y|X)}. Thus the condition T S=T T implies that either Y S=Y T or P(Y S|X S)=P(Y T|X T).When the target and source domains are the D S=D T,and their learning tasks are the ,T S=T T,the learning problem becomes a traditional machine learning problem.When the domains are different,then either(1)the feature spaces between the domains are X S=X T,or(2)the feature spaces between the domains are the same but the marginal probability distributions between domain data are different;

i ∈X S and X T

∈X T.

As an example,in our document classiﬁcation example,ca (1)corresponds to when the two ts of

documents are described in different languages,and ca(2)may correspond to when the source domain documents and the target domain documents focus on different topics.

Given speciﬁc domains D S and D T,when the learning

tasks T S and T T are different,then either(1)the label spaces

between the domains are Y S=Y T,or(2)the conditional probability distributions between the domains are

P(Y S|X S)=P(Y T|X T),where Y S

∈Y S and

Y T

nancy grace

∈Y T.In our document classiﬁcation example,ca(1) corresponds to the situation where source dom

ain has binary document class,whereas the target domain has ten class to classify the documents to.Ca(2)corresponds to the situation where the source and target documents are very unbalanced in terms of the ur-deﬁned class.

In addition,when there exists some relationship,explicit or implicit,between the feature spaces of the two domains,we say that the source and target domains are related.

2.3A Categorization of Transfer Learning Tech-niques

In transfer learning,we have the following three main rearch issues:(1)What to transfer;(2)How to transfer;(3)When to transfer.

“What to transfer”asks which part of knowledge can be transferred across domains or tasks.Some knowledge is speciﬁc for individual domains or tasks,and some knowledge may be common between different domains such that they may help improve performance for the target domain or task.After discovering which knowledge can be transferred,learning algorithms need to be developed to transfer the knowledge, which corresponds to the“how to transfer”issue.

“When to transfer”asks in which situations,transferring skills should be done.Likewi,we are interest

ed in knowing in which situations,knowledge should not be transferred.In some situations,when the source domain and target domain are not related to each other,brute-force transfer may be unsuc-cessful.In the worst ca,it may even hurt the performance of learning in the target domain,a situation which is often referred to as negative transfer.Most current work on transfer learning focus on“What to transfer”and“How to transfer”, by implicitly assuming that the source and target domains be related to each other.However,how to avoid negative transfer is an important open issue that is attracting more and more attention in the future.

Bad on the deﬁnition of transfer learning,we summarize the relationship between traditional machine learning and var-ious transfer learning ttings in Table1,where we categorize transfer learning under three sub-ttings,inductive trans-fer learning,transductive transfer learning and unsupervid transfer learning,bad on different situations between the source and target domains and tasks.

1)In the inductive transfer learning tting,the target task

is different from the source task,no matter when the source and target domains are the same or not.

In this ca,some labeled data in the target domain are required to induce an objective predictive mo

del f T(·) for u in the target domain.In addition,according to different situations of labeled and unlabeled data in the source domain,we can further categorize the inductive transfer learning tting into two cas:butcher怎么读

TABLE1

Relationship between Traditional Machine Learning and Various Transfer Learning Settings Learning Settings Source and Target Domains Source and Target Tasks Traditional Machine Learning the same the same Inductive Transfer Learning/the same different but related Transfer Learning Unsupervid Transfer Learning different but related different but related Transductive Transfer Learning different but related the same

(1.1)A lot of labeled data in the source domain are

available.In this ca,the inductive transfer learning tting is similar to the multi-task learning tting.

However,the inductive transfer learning tting only aims at achieving high performance in the target task by transferring knowledge from the source task while multi-task learning tries to learn the target and source task simultaneously.

(1.2)No labeled data in the source domain are available.

In this ca,the inductive transfer learning tting is similar to the lf-taught learning tting,which is ﬁrst propod by Raina et al.[22].In the lf-taught learning tting,the label spaces between the source and target domains may be different,which implies the side information of the source domain cannot be ud directly.Thus,it’s similar to the inductive transfer learning tting where the labeled data in the source domain are unavailable.

2)In the transductive transfer learning tting,the source

and target tasks are the same,while the source and target domains are different.

In this situation,no labeled data in the target domain are available while a lot of labeled data in the source domain are available.In addition,according to different situations between the source and target domains,we can further categorize the transductive transfer learning tting into two cas.

(2.1)The feature spaces between the source and target

domains are different,X S=X T.

(2.2)The feature spaces between domains are the same,

X S=X T,but the marginal probability distributions of the input data are different,P(X S)=P(X T).

The latter ca of the transductive transfer learning tting is related to domain adaptation for knowledge transfer in text classiﬁcation[23]and sample lection bias[24]or co-variate shift[25],who assumptions are similar.

3)Finally,in the unsupervid transfer learning tting,

similar to inductive transfer learning tting,the target task is different from but related to the source task.

However,the unsupervid transfer learning focus on solving unsupervid learning tasks in the target domain, such as clustering,dimensionality reduction and density estimation[26],[27].In this ca,there are no labeled data available in both source and target domains in training.

The relationship between the different ttings of transfer learning and the related areas are summarized in Table2and Figure2.

Approaches to transfer learning in the above three different ttings can be summarized into four cas bad on“What to transfer”.Table3shows the four cas and brief description. Theﬁrst conte

xt can be referred to as instance-bad transfer-learning(or instance-transfer)approach[6],[28],[29],[30], [31],[24],[32],[33],[34],[35],which assumes that certain parts of the data in the source domain can be reud for learning in the target domain by re-weighting.Instance re-weighting and importance sampling are two major techniques in this context.

A cond ca can be referred to as feature-reprentation-transfer approach[22],[36],[37],[38],[39],[8],[40],[41], [42],[43],[44].The intuitive idea behind this ca is to learn a“good”feature reprentation for the target domain.In this ca,the knowledge ud to transfer across domains is encoded into the learned feature reprentation.With the new feature reprentation,the performance of the target task is expected to improve signiﬁcantly.

A third ca can be referred to as parameter-transfer ap-proach[45],[46],[47],[48],[49],which assumes that the source tasks and the target tasks share some parameters or prior distributions of the hyper-parameters of the models.The transferred knowledge is encoded into the shared parameters or priors.Thus,by discovering the shared parameters or priors, knowledge can be transferred across tasks.

Finally,the last ca can be referred to as the relational-knowledge-transfer problem[50],which deals 第一名英文

with transfer learning for relational domains.The basic assumption behind this context is that some relationship among the data in the source and target domains are similar.Thus,the knowledge to be transferred is the relationship among the data.Recently, statistical relational learning techniques dominate this context [51],[52].

Table4shows the cas where the different approaches are ud for each transfer learning tting.We can e that the inductive transfer learning tting has been studied in many rearch works,while the unsupervid transfer learning tting is a relatively new rearch topic and only studied in the context of the feature-reprentation-transfer ca.In addition,the feature-reprentation-transfer problem has been propod to all three ttings of transfer learning.However, the parameter-transfer and the relational-knowledge-transfer approach are only studied in the inductive transfer learning tting,which we discuss in detail below.

3I NDUCTIVE T RANSFER L EARNING

Deﬁnition2(Inductive Transfer Learning)Given a source domain D S and a learning task T S,a target domain D T and

TABLE2

Different Settings of Transfer Learning

Transfer Learning Settings Related Areas Source Domain Labels Target Domain Labels Tasks Inductive Transfer Learning Multi-task Learning Available Available Regression,

Classiﬁcation

Self-taught Learning Unavailable Available Regression,

Classiﬁcation

Transductive Transfer Learning Domain Adaptation,Sample

Selection Bias,Co-variate Shift Available Unavailable Regression,

Classiﬁcation

Unsupervid Transfer Learning Unavailable Unavailable Clustering,

Dimensionality

Reduction

Fig.2.An Overview of Different Settings of Transfer

TABLE3

Different Approaches to Transfer Learning

Transfer Learning Approaches Brief Description

Instance-transfer To re-weight some labeled data in the source domain for u in the target domain[6],[28],[29],

[30],[31],[24],[32],[33],[34],[35].

Feature-reprentation-transfer Find a“good”feature reprentation that reduces difference between the source and the target

domains and the error of classiﬁcation and regression models[22],[36],[37],[38],[39],[8],

[40],[41],[42],[43],[44].

Parameter-transfer Discover shared parameters or priors between the source domain and target domain models,which

can beneﬁt for transfer learning[45],[46],[47],[48],[49].

Relational-knowledge-transfer Build mapping of relational knowledge between the source domain and the target domains.Both

domains are relational domains and i.i.d assumption is relaxed in each domain[50],[51],[52].

TABLE4

Different Approaches Ud in Different Settings

Inductive Transfer Learning Transductive Transfer Learning Unsupervid Transfer Learning Instance-transfer

√√

Feature-reprentation-transfer

√√√

Parameter-transfer

√

Relational-knowledge-transfer

√

本文发布于:2023-06-25 00:58:02，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/156600.html

上一篇：语言学填空题

下一篇：食品标签上11个常见词的特殊意义

标签：飞机失事台湾

留言与评论（共有 0 条评论）