重采样方法（ResamplingMethods）（CV,Bootstrap）

更新时间:2023-07-10 01:32:03 阅读：评论：0

重采样⽅法（ResamplingMethods ）（CV,Bootstrap ）

⽂章⽬录

关于考试的作文Introduction

Resampling methods involve repeatedly drawing samples from a training t and re ﬁtting a model of interest on each sample in order to obtain additional information about the ﬁtted model. (e.g. cross-validation, bootstrap)Estimates of test-t prediction error (CV)S.E. and bias of estimated parameters (Bootstrap)

C.I. of target parameter (Bootstrap)

Cross-Validation

The training error rate often is quite di ﬀerent from the test error rate, and in particular the former can dramatically underestimate the latter.Model Complexity Low: High bias, Low variance

Model Complexity High: Low bias, High variance Prediction Error Estimates Large test t Mathematical adjustment CV: Consider a class of methods that estimate the test error rate by holding o

ut a subt of the training obrvations from the ﬁtting process, and then applying the statistical learning method to tho held out obrvations.

The Validation Set Approach

A random splitting into two halves: left part is training t, right part is validation t.Drawbacks

The validation estimate of the test error rate can be highly variable, depending on precily which obrvations are included in the training t and which obrvations are included in the validation t.

Only a subt of the obrvations are ud to ﬁt the model.

Validation t error rate may tend to overestimate the test error rate for the model ﬁt on the entire data t.Leave-One-Out Cross-Validation

C =p (SSE +n 1d 2d )

σ^2AIC =

员工激励方法(SSE +nσ^21d 2d )σ^2BIC =(SSE +nσ^21d log (n )d )

σ^2

LOOCV involves splitting the t of obrvations into two parts. However, instead of creating two subts of comparable size, a single obrvation is ud for the validation t, and the remaining obrvations make up the training t.In Linear Regression

bacomes a weighted MSE Drawbacks

Estimates from each fold are highly correlated and hence their average can have high variance.

K-fold Cross-Validation

This approach involves randomly dividing the t of obrvations into k groups, or folds, of approximately equal size. The ﬁrst fold is treated as a validation t, and the method is ﬁt on the remaining k − 1 folds. This procedure is repeated k times; each time, a di ﬀerent group of obrvations is treated as a validation t. This process results in k estimates of the test error. The k-fold CV estimate is computed by averaging the values. If k=n, then it is LOOCV.

Typically, given the considerations, one performs k-fold cross-validation using k = 5 or k = 10, as the values have been shown empirically to yield test error rate estimates that su ﬀer neither from e

xcessively high bias nor from very high variance.Bootstrap A powerful statistical tool to quantify the uncertainty associated with a given estimator or statistical learning method.For example, it can provide an estimate of the standard error of a coe ﬃcient, or a con ﬁdence interval for that

coe ﬃcient.Steps

Obtain datats ( obrvations) by repeatedly sampling from the original data t with replacement times.

Each of the bootstrap data, denoted as , is the same size as original datat . And bootstrap estimates for denoted as . Thus some obrvations may appear more than once and some not at all ().

Estimate of S.E.

(x ,y )11(x ,y ),...,(x ,y )22n n CV =(n )()n 1i =1∑n

1−h i y −i y

i ^2CV n CV =(k )MSE

k 1i =1∑k

水浒传的读后感CV =(k )Err k 1i =1∑k k n Z B Z ,...,Z ∗1∗B n α,...,α^∗1α^∗B

Estimate of C.I.

Bootstrap Percentile C.I.

幼儿评价Bootstrap S.E. bad C.I.Better Option (Basic Bootstrap/Rever Percentile Interval)

Key: the behavior of is approximately the same as the behavior of .Therefore:In General

Each bootstrap sample has signi ﬁcant overlap with the original data. This will cau the bootstrap to riously underestimate the true prediction error.

大足县

Can partly ﬁx this problem by only using predictions for tho obrvations that did not ( by chance ) occur in the current bootstrap sample. (Complicated)

If the data is a time ries, we can’t simply sample the obrvations with replacement. We can inste

ad create blocks of concutive obrvations, and samp le tho with replacements. Then we paste to gether sampled blocks to obtain a bootstrap samples.

Bootstrap in Regression

Find S.E. and C.I. for and Empirical Bootstrap

SE ()=B θ^(−)B −11r =1∑B

θ^∗r θˉ∗2[L ,U ]=[,]θ

^α/2∗θ^1−α/2∗

[L ,U ]=±θ

ˉz ×1−α/2B SE ∗

[L ,U ]=[2−θ

^θ,2−1−α/2∗θ^θ]α/2∗

−θ

中华石园^∗θ^−θ^θ0.95=P (≤θ^α/2∗≤θ^∗)θ^1−α/2∗

=P (−θ^α/2∗≤θ^−θ^∗≤θ^−θ^1−α/2∗)θ

^=P (−θ^α/2∗≤θ^−θ^∗θ≤−θ^1−α/2∗)θ^=P (−θ

^α/2∗≤θ^−θ^θ≤−θ^1−α/2∗)θ^=P (2−θ^θ≤1−α/2∗θ≤2−θ^θ)α/2∗Y =i β+0βX +1i ϵ, i =i 1,...,n

β0β1

Resampling and obtain:Bootstrap sample 1: Bootstrap sample 2: …Bootstrap sample 1: For each Bootstrap sample, fit regression and obtain , then estimate S.E. and C.I.Residual Bootstrap

Recall that residuals to mimic the role of .Bootstrap the residuals and obtain:Bootstrap residual 1: Bootstrap residual 1: …Bootstrap residual 1: Generate new bootstrap sample: For each bootstrap sample, fit regression and estimate S.E. and C.I.

日本维新Wild Bootstrap

When variance of error depends on the value of ( so called heteroskedasticity) , residual bootstrap is

unstable becau the residual bootstrap will swap all the residuals regardless of the value of X. But wild bootstrap us the residual of itlf only.Generate IID random variables Generate new bootstrap sample: For each bootstrap sample, fit regression and estimate S.E. and C.I.(X ,Y ),...,(X ,Y )11n n (X ,Y ),...,(X ,Y )

1∗11∗1n ∗1n ∗1

(X ,Y ),...,(X ,Y )

1∗21∗2n ∗2n ∗2(X ,Y ),...,(X ,Y )1∗B 1∗B n ∗B n ∗B (,)...(,)β^0∗1β^1∗1β^0∗B β^1∗B ϵ,...,e ^1∗1e ^n

背诵法∗1,...,e ^1∗2e ^n

∗2,...,e ^1∗B e ^n ∗B

X =i ∗b X , Y =i i

∗b +β^0X +β^1i e ^i ∗b V ar (ϵ∣X )i i X i V ,...,V ∼1b n b N (0,1)

X =i ∗b X , Y =i i ∗b +β^0X +β^1i V i b e ^i

本文发布于:2023-07-10 01:32:03，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1075078.html

上一篇：采样数据系统基本原理

下一篇：最远点采样（FarthestPointSampling）介绍