电商平台用户退款预测模型(Python语言)

更新时间:2023-05-20 08:44:44 阅读: 评论:0

电商平台⽤户退款预测模型(Python语⾔)
电商平台⽤户退款预测模型(Python语⾔)
(…待改进)
# 加载需要⽤到的包
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
氧化剂import aborn as sns
plt.style.u('fivethirtyeight')
from warnings import filterwarnings
filterwarnings('ignore')
ad_excel('chargepre.xlsx')
orders
goodsID orderAmount payment chanelID platfromType discount payyear paymonth chargeback urcount_total_count
1PR0009401978.471770.81渠道-
0530
WechatMP0.104960201910否1
2PR000512521.60511.59渠道-
0765
heWechatMP0.01919120195否1
3PR000398466.89443.55渠道-
0007
WechatMP0.049990201911否2
4PR0003512337.012328.43渠道-
0985
APP0.003671201910是2
5PR0004332178.202162.14渠道-
9527
APP0.00737320191否1
6PR0007714949.654879.94渠道-
0007
WechatMP0.014084201911否1
7PR000828565.26556.96渠道-
0896
WechatMP0.01468420199否1
8PR000147430.34373.46渠道-
0465
APP0.132********否1
9PR000060694.52664.79渠道-
0530
APP0.04280720193是2
10PR000072453.83371.50渠道-
lead过去式0283
WechatMP0.181412201912否2
broadcasting11PR000898529.57488.08渠道-
0530
WEB0.078347201911否2
104552 rows × 10 columns orders.info()
<class 'frame.DataFrame'>
Int64Index: 104552 entries, 1 to 104552
Data columns (total 10 columns):
goodsID                  104552 non-null object orderAmount              104552 non-null float64 payment                  104552 non-null float64 chanelID                104552 non-null object platfromType            104552 non-null object discount                104552 non-null float64 payyear                  104552 non-null int64 paymonth                104552 non-null int64 chargeback              104552 non-null object urcount_total_count    104552 non-null int64 dtypes: float64(3), int64(3), object(4)
memory usage: 8.8+ MB
plt.figure(figsize=(20,10))
sns.pairplot(orders)
sns.(),annot=True,cmap='viridis')banshee
target_array = orders['chargeback'].copy()
train =  orders.drop(orders[['chargeback']],axis=1)
搦战test = train
from sklearn import preprocessingdrunk driving
le = preprocessing.LabelEncoder()
for feature in['goodsID','chanelID','platfromType','orderAmount','payment','discount','payyear','paymonth','urcount_total_count']:
train[feature]=le.fit_transform(train[feature])
# test[feature]=le.transform(test[feature])
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
train = pd.DataFrame(scaler.fit_transform(train), lumns)
test = pd.ansform(test), lumns)
X = train
y = target_array
X_to_be_predicted = test
すき
del_lection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
from lightgbm import LGBMClassifier
model = LGBMClassifier(learning_rate=0.1,
n_estimators=10000,
max_depth=5,
南宁商务英语培训min_child_weight=1,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
nthread=4,
scale_pos_weight=3,
ed=10)
model.fit(X_train, y_train)
[LightGBM] [Warning] Unknown parameter: gamma
[LightGBM] [Warning] num_threads is t with n_jobs=-1, nthread=4 will be ignored. Current value: num_threads=-1
bec商务英语听力[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly t num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).
LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=0.8,
gamma=0, importance_type='split', learning_rate=0.1, max_depth=5,
min_child_samples=20, min_child_weight=1, min_split_gain=0.0,
n_estimators=10000, n_jobs=-1, nthread=4, num_leaves=31,
objective=None, random_state=None, reg_alpha=0.0, reg_lambda=0.0,
scale_pos_weight=3, ed=10, silent=True, subsample=0.8,
subsample_for_bin=200000, subsample_freq=0)
ics import classification_report
#打印评分
print(classification_report(y_train, model.predict(X_train)))
#测试集
print(classification_report(y_test, model.predict(X_test)))
y_predict = model.predict(X_to_be_predicted)
y_predict
precision    recall  f1-score  support
否      1.00      0.99      1.00    68085
是      0.96      0.98      0.97    10329
micro avg      0.99      0.99      0.99    78414
macro avg      0.98      0.99      0.98    78414
weighted avg      0.99      0.99      0.99    78414
precision    recall  f1-score  support
否      0.87      0.93      0.90    22687
是      0.12      0.06      0.08      3451
micro avg      0.82      0.82      0.82    26138
macro avg      0.50      0.50      0.49    26138
weighted avg      0.77      0.82      0.79    26138
array(['否', '否', '否', ..., '否', '否', '否'], dtype=object)
训练结果明显过拟合,由于原始数据(0/1)⽐例⼗分不均衡,接近9:1,故导致训练结果异常。关于样本不均衡情况,应当进⾏如何处理?
根据样本数据⼜是否能⾼精度预测呢?

本文发布于:2023-05-20 08:44:44,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/90/115589.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:预测   结果   退款   样本   训练   平台   是否
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图