python----贝叶斯优化调参之Hyperopt
Hyperopt提供了⼀个优化接⼝,这个接⼝接受⼀个评估函数和参数空间,能计算出参数空间内的⼀个点的损失函数值。⽤户还要指定空间内参数的分布情况。
Hyheropt四个重要的因素:指定需要最⼩化的函数,搜索的空间,采样的数据集(trails databa)(可选),搜索的算法(可选)。 ⾸先,定义⼀个⽬标函数,接受⼀个变量,计算后返回⼀个函数的损失值,⽐如要最⼩化函数q(x,y) = x**2 + y**2:
然后,定义⼀个参数空间,⽐如x在0-1区间内取值,y是实数,所以
第三,指定搜索的算法,算法也就是hyperopt的fmin函数的algo参数的取值。当前⽀持的算法由随机搜索(对应是
hyperopt.rand.suggest),模拟退⽕(对应是hyperopt.anneal.suggest),TPE算法。举个栗⼦:
搜索算法本⾝也有内置的参数决定如何去优化⽬标函数,我们可以指定搜索算法的参数,⽐如针对TPE,指定jobs:
关于参数空间的设置,⽐如优化函数q,输⼊fmin(q,space=hp.uniform(‘a’,0,1)).hp.uniform函数的第⼀个
参数是标签,每个超参数在参数空间内必须具有独⼀⽆⼆的标签。hp.uniform指定了参数的分布。其他的参数分布⽐如
hp.choice返回⼀个选项,选项可以是list或者tuple.options可以是嵌套的表达式,⽤于组成条件参数。
slicehp.pchoice(label,p_options)以⼀定的概率返回⼀个p_options的⼀个选项。这个选项使得函数在搜索过程中对每个选项的可能性不均匀。hp.uniform(label,low,high)参数在low和high之间均匀分布。
hp.quniform(label,low,high,q),参数的取值是round(uniform(low,high)/q)*q,适⽤于那些离散的取值。
hp.loguniform(label,low,high)绘制exp(uniform(low,high)),变量的取值范围是[exp(low),exp(high)]
hp.randint(label,upper) 返回⼀个在[0,upper)前闭后开的区间内的随机整数。
搜索空间可以含有list和dictionary. Hyperopt 库为python 中的模型选择和参数优化提供了算法和并⾏⽅案。机器学习常见的模型有KNN,SVM ,PCA ,决策树,GBDT 等⼀系列的算法,但是在实际 def q (args) : x, y = args return x ∗∗ 2 + y ∗∗ 2
1
2
3from hyperopt import hp space = [hp.uniform(’x’, 0, 1), hp.normal (’y’, 0, 1)]
1
2from hyperopt import hp, fmin, rand, tpe, space_eval best = fmin(q, space, algo=rand.suggest)print space_eval(space, best)
1
23from functools import partial from hyperopt import hp, fmin, tpe algo = partial(tpe.suggest, n_startup_jobs=10)best = fmin(q, space, algo=algo)print space_eval(space, best)
1
sui2
3
4
5
使⽤sample函数从参数空间内采样:
在参数空间内使⽤函数:
—————–这是⼀条有点短的昏割线———————————–
在blog上发现了⼀段使⽤感知器判别鸢尾花数据的代码,使⽤的学习率是0.1,迭代40次得到了⼀个测试集上正确率为82%的结果。使⽤hyperopt优化参数,将正确率提升到了91%。from hyperopt import hp list_space = [hp .uniform (’a’, 0, 1),hp .loguniform (’b’, 0, 1)]tuple_space = (hp .uniform (’a’, 0, 1),hp .loguniform (’b’, 0, 1))dict_space = {’a’: hp .uniform (’a’, 0, 1),’b’: hp .loguniform (’b’, 0, 1)}
1
2
3
4
5
6
7
8
9
10from hyperopt.pyll.stochasti import sample print sample(list_space)# => [0.13, .235]print sample(nested_space)# => [[{’ca’: 1, ’a’, 0.12‘}, {’ca’: 2, ’b’: 2.3}],# ’extra_literal_string’,# 3]
1talentshow
2
3
4
5
6
7from hyperopt.pyll import scope def foo (x):return str(x) ∗ 3expr_space = {’a’: 1 + hp.uniform(’a’, 0, 1),’b’: scope.minimum(hp.loguniform(’b’, 0, 1), 10),’c’: scope.call(foo, args=(hp.randint(’c’, 5),)),}
1
2
3
4
5
6
7
8
xgboost具有很多的参数,把xgboost的代码写成⼀个函数,然后传⼊fmin中进⾏参数优化,将交叉验证的auc作为优化⽬标。auc越⼤越好,由于fmin是求最⼩值,因此求-auc的最⼩值。所⽤的数据集是
202列的数据集,第⼀列样本id,最后⼀列是label,中间200列是属性。from sklearn import datats import numpy as np from ss_validation import train_test_split from ics import accuracy_score iris = datats.load_iris()X = iris.data y = iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)from sklearn.preprocessing import StandardScaler sc = StandardScaler()sc.fit(X_train)X_train_std = sc.transform(X_train)X_test_std = sc.transform(X_test)from sklearn.linear_model import Perceptron ppn = Perceptron(n_iter=40, eta0=0.1, random_state=0)ppn.fit(X_train_std, y_train)y_pred = ppn.predict(X_test_std)print accuracy_score(y_test, y_pred)def percept (args): global X_train_std,y_train,y_test ppn = Perceptron(n_iter=int(args["n_iter"]),eta0=args["eta"]*0.01,random_state=0) ppn.fit(X_train_std, y_train) y_pred = ppn.predict(X_test_std) return -accuracy_score(y_test, y_pred)from hyperopt import fmin,tpe,hp,partial space = {"n_iter":hp.choice("n_iter",range(30,50)), "eta":hp.uniform("eta",0.05,0.5)}algo = partial(tpe.suggest,n_startup_jobs=10)best = fmin(percept,space,algo = algo,max_evals=100)print best print percept(best)#0.822222222222#{'n_iter': 14, 'eta': 0.12877033763511717}#-0.911111111111
1
pronto2
3
4
5
6
7
8凯特 戴琳斯艳照
9
10
11
12
13
威尼斯商人英语剧本14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40#coding:utf-8import numpy as np import pandas as pd from sklearn.preprocessing import MinMaxScaler import xgboost as xgb from random import shuffle from xgboost.sklearn import XGBClassifier from ss_validation import cross_val_score import pickle import time from hyperopt import fmin, tpe, hp,space_eval,rand,Trials,partial,STATUS_OK def loadFile (fileName = "E://zalei//browtop200Pca.csv"): data = pd.read_csv(fileName,header=None ) data = data.values return data data = loadFile()
1
2forever意思
3
4
5
6
7
8
9
10
11
12
13
14
15
16
pupil怎么读
17
18
data = loadFile()label = data[:,-1]attrs = data[:,:-1]labels = shape((1,-1))label = list()[0]minmaxscaler = MinMaxScaler()attrs = minmaxscaler.fit_transform(attrs)index = range(0,len(label))shuffle(index)trainIndex = index[:int(len(label)*0.7)]print len(trainIndex)testIndex = index[int(len(label)*0.7):]print len(testIndex)attr_train = attrs[trainIndex,:]print attr_train.shape attr_test = attrs[testIndex,:]print attr_test.shape label_train = labels[:,trainIndex].tolist()[0]print len(label_train)label_test = labels[:,testIndex].tolist()[0]print len(label_test)print np.mat(label_train).reshape((-1,1)).shape def GBM (argsDict): max_depth = argsDict["max_depth"] + 5 n_estimators = argsDict['n_estimators'] * 5 + 50 learning_rate = argsDict["learning_rate"] * 0.02 + 0.05 subsample = argsDict["subsample"] * 0.1 + 0.7 min_child_weight = argsDict["min_child_weight"]+1 print "max_depth:" + str(max_depth) print "n_estimator:" + str(n_estimators) print "learning_rate:" + str(learning_rate) print "subsample:" +
str(subsample) print "min_child_weight:" + str(min_child_weight) global attr_train,label_train gbm = xgb.XGBClassifier(nthread=4, #进程数 max_depth=max_depth, #最⼤深度 n_estimators=n_estimators, #树的数量 learning_rate=learning_rate, #学习率 subsample=subsample, #采样数 min_child_weight=min_child_weight, #孩⼦数 max_delta_step = 10, #10步不降则停⽌ objective="binary:logistic") metric = cross_val_score(gbm,attr_train,label_train,cv=5,scoring="roc_auc").mean() print metric return -metric space = {"max_depth":hp.randint("max_depth",15), "n_estimators":hp.randint("n_estimators",10), #[0,1,2,3,4,5] -> [50,] "learning_rate":hp.randint("learning_rate",6), #[0,1,2,3,4,5] -> 0.05,0.06 "subsample":hp.randint("subsample",4),#[0,1,2,3] -> [0.7,0.8,0.9,1.0] "min_child_weight":hp.randint("min_child_weight",5), # }algo = partial(tpe.suggest,n_startup_jobs=1)best = fmin(GBM,space,algo=algo,max_evals=4)print best print GBM(best)
18
19
20
21
22
23
24
25
26
27
28
guineapig29
30
31
32
33
34
35
36
37
38
39
扬州大学怎么样40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
<link href="/relea/phoenix/production/markdown_views-993eb3f29c.css" rel="stylesheet"> </div>