首页 > 美文阅读

pythonscale标准化函数_pythonpandas标准化列进行回归

更新时间:2023-07-30 13:23:30 阅读：评论：0

pythonscale标准化函数_pythonpandas标准化列进⾏回归

I have the following df:

Date Event_Counts Category_A Category_B

20170401 982457 0 1

优秀员工个人事迹

20170402 982754 1 0

社区计生工作总结20170402 875786 0 1

I am preparing the data for a regression analysis and want to standardize the column Event_Counts, so that it's on a similar scale like the categories.优秀共青团员

I u the following code:

from sklearn import preprocessing

df['scaled_event_counts'] = preprocessing.scale(df['Event_Counts'])

While I do get this warning:

DataConversionWarning: Data with input dtype int64 was converted to float64 by the scale function.

warnings.warn(msg, _DataConversionWarning)

it ems to have worked; there is a new column. However, it has negative numbers like -1.3父亲的英语怎么读

What I thought the scale function does is subtract the mean from the number and divide it by the standard deviation for every row; then add the min of the result to every row.

Does it not work for pandas that way? Or should I u the normalize() function or StandardScaler() function? I wanted to have the standardize column on a scale of 0 to 1.

Thank You

解决⽅案

I think you are looking for the sklearn.preprocessing.MinMaxScaler. That will allow you to scale to a given range.

稠>减数分裂概念So in your ca it would be:腌制食品的危害

scaler = preprocessing.MinMaxScaler(feature_range=(0,1))

名人名言英文df['scaled_event_counts'] = scaler.fit_transform(df['Event_Counts'])

To scale the entire df:

scaled_df = scaler.fit_transform(df)

print(scaled_df)

[[ 0. 0.99722347 0. 1. ]

[ 1. 1. 1. 0. ]

[ 1. 0. 0. 1. ]]