python----贝叶斯优化调参之Hyperopt

更新时间:2023-05-20 10:30:36 阅读：评论：0

Hyperopt提供了⼀个优化接⼝，这个接⼝接受⼀个评估函数和参数空间，能计算出参数空间内的⼀个点的损失函数值。⽤户还要指定空间内参数的分布情况。

Hyheropt四个重要的因素：指定需要最⼩化的函数，搜索的空间，采样的数据集(trails databa)（可选），搜索的算法（可选）。⾸先，定义⼀个⽬标函数，接受⼀个变量，计算后返回⼀个函数的损失值，⽐如要最⼩化函数q(x,y) = x**2 + y**2:

然后，定义⼀个参数空间，⽐如x在0-1区间内取值，y是实数，所以

第三，指定搜索的算法，算法也就是hyperopt的fmin函数的algo参数的取值。当前⽀持的算法由随机搜索(对应是

hyperopt.rand.suggest)，模拟退⽕(对应是hyperopt.anneal.suggest)，TPE算法。举个栗⼦：

搜索算法本⾝也有内置的参数决定如何去优化⽬标函数，我们可以指定搜索算法的参数，⽐如针对TPE，指定jobs：

关于参数空间的设置，⽐如优化函数q，输⼊fmin(q,space=hp.uniform(‘a’,0,1)).hp.uniform函数的第⼀个

参数是标签，每个超参数在参数空间内必须具有独⼀⽆⼆的标签。hp.uniform指定了参数的分布。其他的参数分布⽐如

hp.choice返回⼀个选项，选项可以是list或者tuple.options可以是嵌套的表达式，⽤于组成条件参数。

slicehp.pchoice(label,p_options)以⼀定的概率返回⼀个p_options的⼀个选项。这个选项使得函数在搜索过程中对每个选项的可能性不均匀。hp.uniform(label,low,high)参数在low和high之间均匀分布。

hp.quniform(label,low,high,q)，参数的取值是round(uniform(low,high)/q)*q，适⽤于那些离散的取值。

hp.loguniform(label,low,high)绘制exp(uniform(low,high))，变量的取值范围是[exp(low),exp(high)]

hp.randint(label,upper) 返回⼀个在[0,upper)前闭后开的区间内的随机整数。

搜索空间可以含有list和dictionary. Hyperopt 库为python 中的模型选择和参数优化提供了算法和并⾏⽅案。机器学习常见的模型有KNN,SVM ，PCA ，决策树，GBDT 等⼀系列的算法，但是在实际 def q (args) : x, y = args return x ∗∗ 2 + y ∗∗ 2

3from hyperopt import hp space = [hp.uniform(’x’, 0, 1), hp.normal (’y’, 0, 1)]

2from hyperopt import hp, fmin, rand, tpe, space_eval best = fmin(q, space, algo=rand.suggest)print space_eval(space, best)

23from functools import partial from hyperopt import hp, fmin, tpe algo = partial(tpe.suggest, n_startup_jobs=10)best = fmin(q, space, algo=algo)print space_eval(space, best)

sui2

使⽤sample函数从参数空间内采样：

在参数空间内使⽤函数：

—————–这是⼀条有点短的昏割线———————————–

在blog上发现了⼀段使⽤感知器判别鸢尾花数据的代码，使⽤的学习率是0.1，迭代40次得到了⼀个测试集上正确率为82%的结果。使⽤hyperopt优化参数，将正确率提升到了91%。from hyperopt import hp list_space = [hp .uniform (’a’, 0, 1),hp .loguniform (’b’, 0, 1)]tuple_space = (hp .uniform (’a’, 0, 1),hp .loguniform (’b’, 0, 1))dict_space = {’a’: hp .uniform (’a’, 0, 1),’b’: hp .loguniform (’b’, 0, 1)}

10from hyperopt.pyll.stochasti import sample print sample(list_space)# => [0.13, .235]print sample(nested_space)# => [[{’ca’: 1, ’a’, 0.12‘}, {’ca’: 2, ’b’: 2.3}],# ’extra_literal_string’,# 3]

1talentshow

7from hyperopt.pyll import scope def foo (x):return str(x) ∗ 3expr_space = {’a’: 1 + hp.uniform(’a’, 0, 1),’b’: scope.minimum(hp.loguniform(’b’, 0, 1), 10),’c’: scope.call(foo, args=(hp.randint(’c’, 5),)),}

xgboost具有很多的参数，把xgboost的代码写成⼀个函数，然后传⼊fmin中进⾏参数优化，将交叉验证的auc作为优化⽬标。auc越⼤越好，由于fmin是求最⼩值，因此求-auc的最⼩值。所⽤的数据集是

202列的数据集，第⼀列样本id，最后⼀列是label，中间200列是属性。from sklearn import datats import numpy as np from ss_validation import train_test_split from ics import accuracy_score iris = datats.load_iris()X = iris.data y = iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)from sklearn.preprocessing import StandardScaler sc = StandardScaler()sc.fit(X_train)X_train_std = sc.transform(X_train)X_test_std = sc.transform(X_test)from sklearn.linear_model import Perceptron ppn = Perceptron(n_iter=40, eta0=0.1, random_state=0)ppn.fit(X_train_std, y_train)y_pred = ppn.predict(X_test_std)print accuracy_score(y_test, y_pred)def percept (args): global X_train_std,y_train,y_test ppn = Perceptron(n_iter=int(args["n_iter"]),eta0=args["eta"]*0.01,random_state=0) ppn.fit(X_train_std, y_train) y_pred = ppn.predict(X_test_std) return -accuracy_score(y_test, y_pred)from hyperopt import fmin,tpe,hp,partial space = {"n_iter":hp.choice("n_iter",range(30,50)), "eta":hp.uniform("eta",0.05,0.5)}algo = partial(tpe.suggest,n_startup_jobs=10)best = fmin(percept,space,algo = algo,max_evals=100)print best print percept(best)#0.822222222222#{'n_iter': 14, 'eta': 0.12877033763511717}#-0.911111111111

pronto2

8凯特戴琳斯艳照

威尼斯商人英语剧本14

40#coding:utf-8import numpy as np import pandas as pd from sklearn.preprocessing import MinMaxScaler import xgboost as xgb from random import shuffle from xgboost.sklearn import XGBClassifier from ss_validation import cross_val_score import pickle import time from hyperopt import fmin, tpe, hp,space_eval,rand,Trials,partial,STATUS_OK def loadFile (fileName = "E://zalei//browtop200Pca.csv"): data = pd.read_csv(fileName,header=None ) data = data.values return data data = loadFile()

2forever意思

pupil怎么读

data = loadFile()label = data[:,-1]attrs = data[:,:-1]labels = shape((1,-1))label = list()[0]minmaxscaler = MinMaxScaler()attrs = minmaxscaler.fit_transform(attrs)index = range(0,len(label))shuffle(index)trainIndex = index[:int(len(label)*0.7)]print len(trainIndex)testIndex = index[int(len(label)*0.7):]print len(testIndex)attr_train = attrs[trainIndex,:]print attr_train.shape attr_test = attrs[testIndex,:]print attr_test.shape label_train = labels[:,trainIndex].tolist()[0]print len(label_train)label_test = labels[:,testIndex].tolist()[0]print len(label_test)print np.mat(label_train).reshape((-1,1)).shape def GBM (argsDict): max_depth = argsDict["max_depth"] + 5 n_estimators = argsDict['n_estimators'] * 5 + 50 learning_rate = argsDict["learning_rate"] * 0.02 + 0.05 subsample = argsDict["subsample"] * 0.1 + 0.7 min_child_weight = argsDict["min_child_weight"]+1 print "max_depth:" + str(max_depth) print "n_estimator:" + str(n_estimators) print "learning_rate:" + str(learning_rate) print "subsample:" +

str(subsample) print "min_child_weight:" + str(min_child_weight) global attr_train,label_train gbm = xgb.XGBClassifier(nthread=4, #进程数 max_depth=max_depth, #最⼤深度 n_estimators=n_estimators, #树的数量 learning_rate=learning_rate, #学习率 subsample=subsample, #采样数 min_child_weight=min_child_weight, #孩⼦数 max_delta_step = 10, #10步不降则停⽌ objective="binary:logistic") metric = cross_val_score(gbm,attr_train,label_train,cv=5,scoring="roc_auc").mean() print metric return -metric space = {"max_depth":hp.randint("max_depth",15), "n_estimators":hp.randint("n_estimators",10), #[0,1,2,3,4,5] -> [50,] "learning_rate":hp.randint("learning_rate",6), #[0,1,2,3,4,5] -> 0.05,0.06 "subsample":hp.randint("subsample",4),#[0,1,2,3] -> [0.7,0.8,0.9,1.0] "min_child_weight":hp.randint("min_child_weight",5), # }algo = partial(tpe.suggest,n_startup_jobs=1)best = fmin(GBM,space,algo=algo,max_evals=4)print best print GBM(best)