首页 > 英文翻译

电商平台用户退款预测模型（Python语言）

更新时间:2023-05-20 08:44:44 阅读：评论：0

电商平台⽤户退款预测模型（Python语⾔）

（…待改进）

# 加载需要⽤到的包

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

%matplotlib inline

氧化剂import aborn as sns

plt.style.u('fivethirtyeight')

from warnings import filterwarnings

filterwarnings('ignore')

ad_excel('chargepre.xlsx')

orders

goodsID orderAmount payment chanelID platfromType discount payyear paymonth chargeback urcount_total_count

1PR0009401978.471770.81渠道-

0530

WechatMP0.104960201910否1

2PR000512521.60511.59渠道-

0765

heWechatMP0.01919120195否1

3PR000398466.89443.55渠道-

0007

WechatMP0.049990201911否2

4PR0003512337.012328.43渠道-

0985

APP0.003671201910是2

5PR0004332178.202162.14渠道-

9527

APP0.00737320191否1

6PR0007714949.654879.94渠道-

0007

WechatMP0.014084201911否1

7PR000828565.26556.96渠道-

0896

WechatMP0.01468420199否1

8PR000147430.34373.46渠道-

0465

APP0.132********否1

9PR000060694.52664.79渠道-

0530

APP0.04280720193是2

10PR000072453.83371.50渠道-

lead过去式0283

WechatMP0.181412201912否2

broadcasting11PR000898529.57488.08渠道-

0530

WEB0.078347201911否2

104552 rows × 10 columns orders.info()

Int64Index: 104552 entries, 1 to 104552

Data columns (total 10 columns):

goodsID 104552 non-null object orderAmount 104552 non-null float64 payment 104552 non-null float64 chanelID 104552 non-null object platfromType 104552 non-null object discount 104552 non-null float64 payyear 104552 non-null int64 paymonth 104552 non-null int64 chargeback 104552 non-null object urcount_total_count 104552 non-null int64 dtypes: float64(3), int64(3), object(4)

memory usage: 8.8+ MB

plt.figure(figsize=(20,10))

sns.pairplot(orders)

sns.(),annot=True,cmap='viridis')banshee

target_array = orders['chargeback'].copy()

train = orders.drop(orders[['chargeback']],axis=1)

搦战test = train

from sklearn import preprocessingdrunk driving

le = preprocessing.LabelEncoder()

for feature in['goodsID','chanelID','platfromType','orderAmount','payment','discount','payyear','paymonth','urcount_total_count']:

train[feature]=le.fit_transform(train[feature])

# test[feature]=le.transform(test[feature])

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

train = pd.DataFrame(scaler.fit_transform(train), lumns)

test = pd.ansform(test), lumns)

X = train

y = target_array

X_to_be_predicted = test

すき

del_lection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

from lightgbm import LGBMClassifier

model = LGBMClassifier(learning_rate=0.1,

n_estimators=10000,

max_depth=5,

南宁商务英语培训min_child_weight=1,

gamma=0,

subsample=0.8,

colsample_bytree=0.8,

nthread=4,

scale_pos_weight=3,

ed=10)

model.fit(X_train, y_train)

[LightGBM] [Warning] Unknown parameter: gamma

[LightGBM] [Warning] num_threads is t with n_jobs=-1, nthread=4 will be ignored. Current value: num_threads=-1

bec商务英语听力[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly t num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).

LGBMClassifier(boosting_type='gbdt', class_weight=None, colsample_bytree=0.8,

gamma=0, importance_type='split', learning_rate=0.1, max_depth=5,

min_child_samples=20, min_child_weight=1, min_split_gain=0.0,

n_estimators=10000, n_jobs=-1, nthread=4, num_leaves=31,

objective=None, random_state=None, reg_alpha=0.0, reg_lambda=0.0,

scale_pos_weight=3, ed=10, silent=True, subsample=0.8,

subsample_for_bin=200000, subsample_freq=0)

ics import classification_report

#打印评分

print(classification_report(y_train, model.predict(X_train)))

#测试集

print(classification_report(y_test, model.predict(X_test)))

y_predict = model.predict(X_to_be_predicted)

y_predict

precision recall f1-score support

否 1.00 0.99 1.00 68085

是 0.96 0.98 0.97 10329

micro avg 0.99 0.99 0.99 78414

macro avg 0.98 0.99 0.98 78414

weighted avg 0.99 0.99 0.99 78414

precision recall f1-score support

否 0.87 0.93 0.90 22687

是 0.12 0.06 0.08 3451

micro avg 0.82 0.82 0.82 26138

macro avg 0.50 0.50 0.49 26138

weighted avg 0.77 0.82 0.79 26138

array(['否', '否', '否', ..., '否', '否', '否'], dtype=object)

训练结果明显过拟合，由于原始数据（0/1）⽐例⼗分不均衡，接近9:1，故导致训练结果异常。关于样本不均衡情况，应当进⾏如何处理？

根据样本数据⼜是否能⾼精度预测呢？

本文发布于:2023-05-20 08:44:44，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/115589.html

上一篇：关于OpenCV的上采样和下采样

下一篇：The Impact of SOX Section 404 Internal Control Quality Asssment on Audit Delay in the

标签：预测结果退款样本训练平台是否

留言与评论（共有 0 条评论）