pythonscale标准化函数_pythonpandas标准化列进⾏回归
I have the following df:
Date Event_Counts Category_A Category_B
20170401 982457 0 1
优秀员工个人事迹
20170402 982754 1 0
社区计生工作总结20170402 875786 0 1
I am preparing the data for a regression analysis and want to standardize the column Event_Counts, so that it's on a similar scale like the categories.优秀共青团员
I u the following code:
from sklearn import preprocessing
df['scaled_event_counts'] = preprocessing.scale(df['Event_Counts'])
While I do get this warning:
DataConversionWarning: Data with input dtype int64 was converted to float64 by the scale function.
warnings.warn(msg, _DataConversionWarning)
it ems to have worked; there is a new column. However, it has negative numbers like -1.3父亲的英语怎么读
What I thought the scale function does is subtract the mean from the number and divide it by the standard deviation for every row; then add the min of the result to every row.
Does it not work for pandas that way? Or should I u the normalize() function or StandardScaler() function? I wanted to have the standardize column on a scale of 0 to 1.
Thank You
解决⽅案
I think you are looking for the sklearn.preprocessing.MinMaxScaler. That will allow you to scale to a given range.
稠>减数分裂概念So in your ca it would be:腌制食品的危害
scaler = preprocessing.MinMaxScaler(feature_range=(0,1))
名人名言英文df['scaled_event_counts'] = scaler.fit_transform(df['Event_Counts'])
To scale the entire df:
scaled_df = scaler.fit_transform(df)
print(scaled_df)
[[ 0. 0.99722347 0. 1. ]
[ 1. 1. 1. 0. ]
[ 1. 0. 0. 1. ]]