An Improved Branch and Bound Algorithm for Feature Selection
Xue-wen Chen*
Department of Electrical and Computer Engineering, California State University,
小学关联词大全
clau18111 Nordhoff Street, Northridge, CA 91330-8346
不可忽视>impexAbstract
德威英国国际学校Feature lection plays an important role in pattern classification. In this paper, we prent an improved Branch and Bound algorithm for optimal feature subt lection. This algorithm arches for an optimal solution in a large solution tree in an efficient manner by cutting unnecessary paths which are guaranteed not to contain the optimal solution. Our experimental results demonstrate the effectiveness of the new algorithm.
mcgrady
Keywords – Branch and Bound algorithm, solution tree, feature lection, classification
I.INTRODUCTION
Feature lection plays an important rule in pattern recognition applications. For example, in medical diagnosis, we need to evaluate the effectiveness of various feature combinations and lect effective ones for classification. By using a subt of features, the processing time required by the classification process is reduced.
Feature lection is to lect a subt of m features from a larger t of n features to optimize the value of a criterion function J over all subts of the size m. There are many different feature lection algorithms ud in the literature. Sequential forward lection (SFS) and quential backward lection (SBS) (e.g., Fukunaga (1992)) are two widely ud quential feature lection methods. The SFS method first lects the best single feature and then adds one
feature at a time which in combination with the lected features maximizes the criterion function J; the SBS method starts with all input features and successively deletes one feature at a time. Both SFS and SBS methods are computationally attractive. By dynamically controlling the number of forward and backtracking steps, floating quential arch methods (e.g., Pudil et. al. (1994)) have been shown to perform better than both SFS and SBS algorithms. However, all the quential methods are not guaranteed to produce a feature subt which yields the global maximum value of the criterion among all possible subts, since “the best pair of features need not contain the best si
ngle feature” (e.g., Jain et. al., 2000).
In this paper, we are interested in lecting a feature subt that is the best in terms of a given criterion function. The only optimal feature lection algorithms are exhaustive arch and branch and bound (BB) (e.g., Narendra and Fukunaga (1977)). In an exhaustive arch, the best solution is found by evaluating the criterion function J over all possible combinations of features, which is impractical or impossible, since the number of combinations of features increas exponentially as the dimensionality increas. The other optimal feature lection method which avoids an exhaustive arch is the BB algorithm. This algorithm is efficient becau it avoids an exhaustive arch of the whole arch space by rejecting many subts which are guaranteed to be suboptimal and it guarantees that the lected subt is the global optimal solution for a given criterion function. The BB algorithm works for many practical problems for which an exhaustive arch would be impractical or impossible. A more efficient BB (BB+) algorithm (e.g., Yu and Yuan (1993)) outperforms the original BB algorithm by reducing redundant J evaluations in a BB algorithm. Most recently, Somol et. al. (2000) propod a fast branch and bound algorithm (FBB) with prediction mechanism to evaluate J values.
In this paper, an improved BB algorithm for optimal feature lection is propod, that further reduce
鹅蛋脸女生发型s redundant J evaluations and arches for the optimal solution in an efficient way. Results show that the propod algorithm is faster than BB, BB+ and FBB algorithms.
The paper is organized into five ctions. Section II describes BB and BB+algorithms. The propod feature lection algorithm is prented in Section III. In Section IV, we give the experimental results. Finally, conclusions are drawn in Section V.
II.THE BRANCH AND BOUND ALGORITHM
A brief description of the basic B
B algorithm follows, as our method for feature lection us modifications to the basic BB algorithm (e.g., Narendra and Fukunaga (1977)). We consider the problem of lecting the best t of m features out of n original features that maximizes some criteria function J, where J satisfies the monotonicity condition, i.e. for two feature ts S1 and S2, if S1 is a subt of S2, J(S1) < J(S2). A large number of criterion functions such as discriminant functions and the Bhattacharyya distance satisfy the monotonicity condition.
The BB algorithm is best discusd for a specific example. Consider the problem of lecting m = 2
out of n = 5 features. The BB algorithm actually lects the r = n – m features to be discarded, i.e. lects the r = 3 features to be omitted. A solution tree (Figure 1) is produced; the different paths from the top to bottom levels include all combinations of r = 3 omitted features; numbers in the tree refer to features omitted (node Y refers to the feature t (2, 4, 5) obtained by omitting features (1, 3)). Note that the order of features is not of concern; omitting features (1, 2, 3, 4) is the same as omitting features (2, 1, 4, 3). For r = n – m = 3 features to be omitted, the tree has r levels. The problem is to lect the best path in the tree that yields the largest J m (for m = 2, or equivalently for r = 3). A node at some level has a number of successor
courier
nodes (nodes that follow it in the tree), each corresponds to more omitted features. By monotonicity, successor nodes have a lower J value than the parent or original node. The veral paths from a given node are referred as branches. The bottom nodes are referred to as terminal nodes and all feature choices or branches under a node at a given level are referred to as subtrees (including the original node).
0 level 0
Z
客服专员英文Figure 1: Branch and bound arch tree (for n = 5, m = 2, and r = 3).
Let B be the maximum value of J obtained so far in the arch for a complete path (at a terminal nod
e). In the BB algorithm, the arch is from upper nodes to bottom nodes with
backtracking. If J at some node N at some level L is larger than current bound B, then paths from that node to the bottom of the tree are explored (as long as their J remains larger than B). If a new different full path with a J > B is found, the bound B is updated with the new larger value. This needs not to be the path with best J, however. Thus, backtracking is needed until all successors or nodes and paths with J > B have been arched. As we go down the tree (to higher levels), J decreas becau of the monotonicity condition. Thus, J will not be above B for all levels in a subtree. Omitting evaluation of J for a t of successor nodes (when J < B at some parent nodes) is key to having an efficient BB algorithm. The computational savings in the BB algorithm occur when J at some node N at some level L (L < r) in the tree (e.g., node Y at level L = 2) is less than the prent B, then all successor nodes of N (e.g., the two successor nodes of Y, nodes 4 and 5 in level 3) need not be analyzed (J for their terminal nodes need not be calculated), since by monotonicity, no complete path below node N can give a J larger than the prent B.
In most cas, the basic BB algorithm is faster than exhaustive arch. However, there are still many redundant arches in the BB algorithm. For example, for the nodes with only one branch (e.g., node W in Figure 1), there is no need to evaluate J values for each successor nodes. Instead, J is ca
lculated only once at the last level (terminate node) in any path with only one branch (i.e., J is only calculated once by removing the feature t (3, 4, 5)). Bad on this obrvation, Yu and Yuan (1993) propod the BB+ algorithm, which has been shown to be faster than the basic BB algorithm.
In next ction, we prent our new optimal feature lection algorithm, which further reduces the arch space by cutting unnecessary nodes and hence is faster than BB+.
III.THE IMPROVED BRANCH AND BOUND ALGORITHM
2014湖北高考Both BB and BB+ algorithms perform “top-down” arch with backtracking. We now introduce our improved branch and bound (IBB) algorithm, which employs top-down and right-left arch strategies together with backtracking. IBB fully utilizes the information gained from previous arches, which is ignored by both BB and BB+ algorithms.
We consider an example. Assume that the J value by removing node X in Figure 1 (i.e. removing features (2, 3), or retaining feature t (1, 4, 5) for the computation of J) is less than the current bound B. In both BB and BB+ algorithms, node Z in Figure 1 may still be evaluated. By monotonicity, however, we conclude immediately that there is no need to evaluate node Z, since the removed features (2, 3) (node X) is a subt of features (1, 2, 3) (node Z), i.e., the J values for node Z are gua
ranteed to be smaller than the current bound B.
The propod IBB algorithm fully utilizes information from previous arches with a right-left arch strategy. We first define the partial path as a path consisting of nodes (the number of nodes is less than r) with corresponding J value less than the bound B (e.g., path (2, 3)). Once a partial path is found, IBB algorithm ignores all paths containing the partial path as a subt (we call them parent paths, e.g., (1, 2, 3)). The computational savings in the IBB algorithm occur when J at some node N at some level L (L < r) in the tree is less than the current bound B (i.e. at some partial path), then all successor nodes of N and all parent paths need not be analyzed (J for their terminal nodes need not be calculated), since by monotonicity, no complete path below node N and no parent paths can give a J larger than the prent B (both BB and BB+ algorithms only ignore the successor nodes of N). A right-left arch is needed to remove all parent paths, since parent paths only exist on the left side of partial paths.