首页 > 英语园地

普通最小二乘法回归-OLS（ordinaryleastsquare）

更新时间:2023-06-01 10:23:50 阅读：评论：0

普通最⼩⼆乘法回归-OLS（ordinaryleastsquare）

前⾔

这篇博客⽤来记录初学普通最⼩⼆乘回归遇到的相关知识点和解决问题的过程。

开发环境：Pycharm 2018.1.2

版本：Python 2.7.14 :: Anaconda, Inc.

回归 - 已有数据

数据集：

简介：从 1990 年⾄今，美国加州所有街区⼈⼝普查的信息，关于 9 组变量，共 20640 个观测值。

Variables Bols tols

INTERCEPT (截距)11.4939275.7518

MEDIAN INCOME (收⼊中值)0.479045.7768

MEDIAN INCOME2 (收⼊中值2)-0.0166-9.4841

MEDIAN INCOME3 (收⼊中值3)-0.0002-1.9157

ln(MEDIAN AGE) (年龄中位数)0.157033.6123 ln(TOTAL ROOMS/ POPULATION) (总房屋数/⼈⼝)-0.8582-56.1280 ln(BEDROOMS/ POPULATION) (卧室/⼈⼝)0.804338.0685

ln(POPULATION/ HOUSEHOLDS) (⼈⼝/家庭)-0.4077-20.8762

ln(HOUSEHOLDS) (家庭)0.047713.0792⽤下⾯代码读⼊数据, 并弄清楚哪些是⾃变量哪个是因变量:

import pandas as pd

import numpy as np

data = pd.read_csv("cal_housing.csv")

name = lumns

X = data[name[:8]] # 第1-8列

y = data[name[8:9]] # 第9列

print("X name :", name[:8])

心神不宁焦虑怎么办print("y name :", name[8:9])

print(data.shape, X.shape, y.shape) # 返回⾏列数

---------------------------------------------

('X name :', Index([u'longitude', u'latitude', u'housingMedianAge', u'totalRooms',

hertzu'totalBedrooms', u'population', u'houholds', u'medianIncome'],

dtype='object'))

兔子的英文

('y name :', Index([u'medianHouValue'], dtype='object'))

((20640, 9), (20640, 8), (20640, 1))

把数据随机分成训练集和测试集

可⾃⼰决定随机种⼦(多少位数都可以)和测试集百分⽐(⼩于0.5即⼩于50%)

ed = 8888# 随机种⼦

proportion = 0.1# 测试集百分⽐

del_lection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=proportion, random_state=ed)

print(X_train.shape,X_test.shape, y_train.shape, y_test.shape)

---------------------------------------------------

((18576, 8), (2064, 8), (18576, 1), (2064, 1))

做回归, 并求出 R^2 和 N M S E

reg = LinearRegression() # 线性回归（Linear Regression）

res = reg.fit(X_train, y_train) # 对训练集X_train, y_train进⾏训练

y_hat = res.predict(X_test) # 使⽤训练得到的估计器对输⼊为X_test的集合进⾏预测，得到y_hat

e = y_test-y_hat # 计算残差

SSE_cv = np.mean(e**2) # 残差平⽅和

SSE_test = np.mean((an(y_test))**2) # 拍脑袋平⽅和

NMSE_cv = SSE_cv/SSE_test # 标准化均⽅误差 NMSE_cv

R2_cv = 1 - NMSE_cv # 可决系数R2_cv

print R2_cv

print NMSE_cv

-----------------------------------------------------------------

medianHouValue 0.657186

dtype: float64

英文版medianHouValue 0.342814

dtype: float64

回归 - 模拟数据

⾃⼰决定样本量(n), ⾃变量个数(p)和系数值(B), ⾃⼰决定正态误差的均值m和标准差s

ed = 8888# 随机种⼦

n = 100# 样本量

p = 7# ⾃变量个数

m = 0# 误差项均值

s = 5# 标准差

B = [2, 5, 16, 9, -3, -5, -2] # beta值

C = [2, 2]

np.random.ed(ed)

X = al(0, 1, (n, p))

y = X.dot(B)+al(m, s, n)

print(X.shape, y.shape)

----------------------------------------------------------

((100L, 7L), (100L,))

实施回归

import statsmodels.api as sm

# 增加截距项

mod = sm.OLS(y, X) # 普通最⼩⼆乘模型，ordinary least square model

res = mod.fit()

donna summer#输出R^2

print("R^2:",res2.rsquared,"\nNMSE:",1-res2.rsquared)

----------------------------------------------------------

R^2: 0.92564484308

NMSE: 0.0743551569196insisted

print (res2.summary())

---------------------------------

假期的英文OLS Regression Results2020年世界读书日主题

============================================================================== Dep. Variable: y R-squared: 0.926

Model: OLS Adj. R-squared: 0.920

Method: Least Squares F-statistic: 165.4

Date: Mon, 07 May 2018 Prob (F-statistic): 1.32e-49

Time: 09:54:25 Log-Likelihood: -304.71

No. Obrvations: 100 AIC: 623.4

Df Residuals: 93 BIC: 641.7

Df Model: 7

Covariance Type: nonrobust

============================================================================== coef std err t P>|t| [0.0250.975]

------------------------------------------------------------------------------

x1 2.34770.532 4.4150.000 1.292 3.404

x2 5.58290.51110.9290.000 4.568 6.597man in black

x3 15.76190.59626.4440.00014.57816.946

x4 8.95950.52117.1810.0007.9249.995

x5 -3.30480.530 -6.2330.000 -4.358 -2.252

x6 -4.99320.491 -10.1750.000 -5.968 -4.019

x7 -2.01260.536 -3.7540.000 -3.077 -0.948

============================================================================== Omnibus: 0.577 Durbin-Watson: 1.970

Prob(Omnibus): 0.749 Jarque-Bera (JB): 0.227

Skew: -0.078 Prob(JB): 0.893compaq

Kurtosis: 3.174 Cond. No. 1.51

==============================================================================

Warnings:

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

附录

本文发布于:2023-06-01 10:23:50，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/826035.html

上一篇：最小二乘法（leastsquares）的曲线拟合（curvefitting）

下一篇：That doing, at least

标签：回归数据训练误差均值变量决定相关

留言与评论（共有 0 条评论）