Multi-Instance Learning:A Survey
Zhi-Hua Zhou
National Laboratory for Novel Software Technology,
Nanjing University,Nanjing210093,China
Abstract In multi-instance learning,the training t compris labeled bags that are compod of unlabeled instances,and the task is to predict the labels of unen bags.This paper provides a survey on this topic.Atfirst,it introduces the origin of multi-instance learning.Then,developments on the study of learnability,learning algorithms,applications and extensions of multi-instance learning are reviewed.In particular,this paper employs a unified view to look into multi-instance learning algorithms.Some important issues to be addresd are also discusd.
Keywords Learning from Examples;Multi-Instance Learning;Supervid Learn-ing企业管理制度汇编
幼儿园安全工作总结Corresponding author:
Zhi-Hua Zhou
Mail:National Laboratory for Novel Software Technology,
Nanjing University,Mailbox419,
Hankou Road22,Nanjing210093,China
Tel.:+86-25-8368-6268
Fax:+86-25-8368-6268
E-mail:zhouzh@nju.edu
Multi-Instance Learning:A Survey
Abstract
In multi-instance learning,the training t compris labeled bags that are compod of unlabeled instances,and the task is to predict the labels of unen bags.This paper provides a survey on this topic.Atfirst,it introduces the origin of multi-instance learning. Then,developments on the study of learnability,learning algorithms,applications and extensions of multi-instance learning are reviewed.In
particular,this paper employs a unified view to look into multi-instance learning algorithms.Some important issues to be addresd are also discusd.
Key words:Learning from Examples;Multi-Instance Learning;Supervid Learning
1Introduction
During the past years,learning from examples is one of the mostflourishing ar-eas in machine learning.According to the ambiguity of training data,rearch in this area can be roughly categorized into three learning super-vid learning,unsupervid learning,and reinforcement learning[16].Supervid learning attempts to learn a concept for correctly labeling unen instances,where the training instances are with known labels and therefore the ambiguity is the minimum.Unsupervid learning attempts to learn the structure of the underly-ing sources of instances,where the training instances are without known labels and therefore the ambiguity is the maximum.Reinforcement learning attempts to learn a mapping from states to actions,where the instances are with no labels but with delayed rewards that could be viewed as delayed labels,therefore the ambiguity is between that of supervid learning and unsupervid learning.
The term multi-instance learning was coined by Dietterich et al.[10]when they were investigating the
problem of drug activity prediction.In multi-instance learning,
金秋野the training t is compod of many bags each contains many instances.A bag is positively labeled if it contains at least one positive instance;otherwi it is labeled as a negative bag.The task is to learn some concept from the training t for correctly labeling unen bags.
贝多芬生平简介
Different to supervid learning where all training instances are with known labels, in multi-instance learning the labels of the training instances are unknown;different to unsupervid learning where all training instances are without known labels, in multi-instance learning the labels of the training bags are known;different to reinforcement learning where the labels of the training instances are delayed,in multi-instance learning there is no any delay.It has been shown that learning algorithms ignoring the characteristics of multi-instance problems,such as popular decision trees and neural networks,could not work well in this scenario[10].陈学冬身高
Since multi-instance problems extensively exist but are unique to tho addresd by previous learning frameworks,multi-instance learning was regarded as a new learning framework[16],and has attracted much attention of the machine learning community.Since many developments on multi-instance learning have been made, it ems uful to have a survey on this topic,which is the purpo of this paper.
The rest of this paper is organized as follows.Section2introduces the origin of multi-instance learning.Section3prents the analys on the learnability of multi-instance learning.Section4looks into multi-instance learning algorithms with a supervid view.Section5and Section6prents applications and extensions of multi-instance learning,respectively.Finally,Section7discuss on veral issues to be addresd in thisfield.
2Drug Activity Prediction
惹事生非In the middle of1990s,Dietterich et al.[10]investigated the problem of drug activity prediction.The goal was to endow learning systems with the ability of predicting that whether a new molecule was qualified to make some drug,through
analyzing a collection of known molecules.
Most drugs are small molecules working by binding to larger protein molecules such as enzymes and cell-surface receptors.For molecules qualified to make a drug,one of its low-energy shapes could tightly bind to the target area;while for molecules unqualified to make a drug,none of its low-energy shapes could tightly bind to the target area.The main difficulty of drug activity prediction lies in that each molecule could have many alternative low-energy shapes,as illustrated in Fig.1, but curr
ently biochemists only know that whether a molecule is qualified to make a drug or not,instead of knowing that which of its alternative low-energy shapes respons for the qualification.
Fig.1.The shape of a molecule changes as it rotates an internal bond
An intuitive solution is to utilize supervid learning algorithms by regarding all the low-energy shapes of the‘good’molecules as positive training instances,while regarding all the low-energy shapes of the‘bad’molecules as negative training instances.However,as shown by Dietterich et al.[10],such a method can hardly work due to high fal positive noi,which is caud by that a‘good’molecule may have hundreds of low-energy shapes but maybe only one of them is really a ‘good’shape.
In order to solve this problem,Dietterich et al.[10]regarded each molecule as a bag, and regarded the
alternative low-energy shapes of the molecule as the instances in the bag,thereby formulated multi-instance learning.In order to reprent the shapes,the molecule was placed in a standard position and orientation and then a t of162rays emanating from the origin was constructed so that the molecular surface was sampled approximately uniformly,as illustrated in Fig.2.There were also four features that reprented the position of an oxygen atom on the molecular
北京大学医学部研究生招生网
surface.Therefore each instance in the bags was reprented by a166-dimensional numerical feature vector.
贬义词成语
Fig.2.The ray-bad reprentation of the molecular shape
Dietterich et al.[10]propod three axis-parallel rectangle(abbreviated as APR)al-gorithms to solve the drug activity prediction problem,which attempts to arch for appropriate axis-parallel rectangles constructed by the conjunction of the features. The GFS elim-count APR algorithm identifies an APR covering all the instances from positive bags atfirst.Then,it gradually shrinks the APR through greedily eliminating instances from negative bags.Afigure from Dietterich et al.’s paper [10]well illustrated this process,as shown in Fig.3,where the white and dark points reprented instances from positive and negative bags,respectively,differ-ent shapes reprented different bags,and the initial APR was indicated with solid line.For each instance from native bags,the algorithm counts the minimum num-ber of instances from positive bags that must be excluded from the APR in order to exclude the concerned instance from negative bag.In Fig.3,the counts were shown next to small lines that indicated which side of the APR should be shrunk in order to exclude the instance from negative bag.The algorithm iteratively choos to eliminate the instance from negative bag that is easiest to eliminate until all such instances are eliminated.The resulting APR was indicated with the dash line in Fig.3.Then,the algorithm determines the bounds of relevant features with greedy feature lection,and therefore obtains thefinal APR.
The difference between the GFS kde APR algorithm and the GFS elim-count APR algorithm mainly lies in the fact that the former does not merely counting the
Fig.3.An illustration of the GFS elim-count APR algorithm
number of instances from positive bags that must be excluded in order to exclude an instance from negative bag.Instead,GFS kde APR considers the number of instances from different positive bags that are covered by the initial APR,and us a cost function to control the process of eliminating instances from negative bags so that each positive bag rerves at least one instance in the APR.The Iterated-discrim APR algorithm employs greedy backfitting algorithm to identify an APR that covers at least one instance from each positive bag.Then,it utilizes this APR to lect the most discriminating features.Finally,kernel density estimation is exploited to help improve the generalizatio
n through expanding the bounds of the APR so that with high probability,new instances from positive bags will fall inside the APR.
Dietterich et al.’s experiments[10]showed that the Iterated-discrim APR algorithm achieved the best result on the Musk data[6],which is a concrete test data of drug activity prediction and the most popularly ud benchmark in multi-instance learning.It is worth mentioning that Dietterich et al.[10]indicated that since the Iterated-discrim APR algorithm had been geared to the Musk data,its performance might be the upper bound on this data t.
Note that multi-instance problems do not emerge suddenly from drug activity prediction.In fact,they extensively exist in real-world applications[14][32].But unfortunately,the uniqueness of the problems have not been particularly distin-guished until Dietterich et al.[10].