Notes on treatment-effect高级计量Notes(美国A&M大学,甘犁教授)

更新时间:2023-05-20 09:35:18 阅读：14 评论：0

娓Average Treatment Effect

Li Gan

Nov, 2007

1. The Regression Method:

We are interested in average changes in outcome y . Denote 1 if with treatment, and 0 without treatment. Average Treatment Effect is defined as:

ATE = E(y 1 – y 0) (1)

The difficulty in estimating is that we obrve y1 or y0, not both, for each person. More precily, let w = 1 if treatment. The obrved outcome y can be written as:

y = (1-w ) y 0 + w y 1. (2)

If w is independent of y , then:

E(y 1-y 0) = E(y 1-y 0 |w ) = E(y 1|w =1) – E(y 0|w =0)

In fact, we only need the weak assumption (rather than independence): mean independence: E(y 0|w ) = E(y 0), E(y 1|w ) = E(y 1).

Now let:

()()0

,0,11110000=+==+=v E v y v E v y μμ

Therefore, (2) can be written as:

(3) ()(01001010)1(v v w v w wy y w y −++−+=+−=μμμ)

First, assume conditional mean independence:

Assumption 1 (ATE 1): (a) E(y 0|w,x ) = E(y 0|x ), and (b) E(y 1|w,x ) = E(y 1|x )

Intuition: even though y 1 and y 0 may be correlated with w , they are uncorrelated with w if we partial out x .

Taking expectation of (3) (and with ATE 1):

E(y|w,x ) = μ0 + αw + g 0(x ) + w (g 1(x )- g 0(x )), (4)

where α=μ1- μ0 is the Average Treatment Effect (ATE), and g i (x )=E (v i |x ).

Linearization of g i (x ):

E(y|w,x ) = μ0 + αw + x β0 + w (x-ψ)δ,

where ψ=E(x). The last term is to ensure that g 1(x )- g 0(x )=0. So the regression to estimate ATE α is:

y i on 1, w i , x i , w i (x i – x )

Here the control functions involve not just xi, but also interactions of the covariates with the treatment variable.

We can estimate treatment effect conditional on x:

()()δαˆˆˆx x x E T A −+=

2. Propensity Score:

Let p(x ) = Pr(w =1|x ).

(w – p(x )) y = (w – p(x ))(wy 1 + (1-w ) y 0)

= wy 1 – p(x ) (1-w ) y 0 – p(x )wy 1

Take conditional expectation with respect to y:

E y [(w – p(x ))y|w,x ]= wm 1(x )– p(x ) (1-w ) m 0(x )– p(x )wm 1(x ),

where E(y j |w,x )= E(y j |x )=m j (x ). Taking expectation with respect to w:

E w {E y [(w – p(x ))y|w,x ]|x }

= E w [wm 1(x )– p(x ) (1-w ) m 0(x )– p(x )wm 1(x )]

= p(x )m 1(x )– p(x ) (1- p(x )) m 0(x )– p(x ) p(x )m 1(x )

财务会计职责= m 1(x )p(x )(1-p(x ))- m 0(x )p(x )(1- p(x ))funny是什么意思英文

=(m 1(x )-m 0(x ))p(x )(1-p(x ))

Therefore,

()()

()()))

(1)(()(01x p x p y x p w E x m x m ATE −−=−= A simple and popular estimator in program evaluation is obtained from OLS regression:

y i on 1, w i , ()i x p

where coefficient for w i is the estimate of the treatment effect. In other words, the estimated propensity score plays the role of the control function.

3. Dummy Endogenous Variables深爱的英文

Consider the model again:

E(y|w,x ) = μ0 + αw + x β0 + u 0, (4)

w is endogenous. Again, w = 1 if treated, and 0 otherwi.

Assume that Pr(w=1|x,z ) = G(x, z; γ)

Procedure 1:

(1) Estimate the binary respon model Pr(w i =1|x i ,z i ) = G(x i ,z i ;γ), and obtain the

fitted values . i

G ˆ(2) Estimate (4) using instruments 1, and x i . i

G ˆ

Procedure 1 has important robustness property:

(a) Becau we u as an IV, the model Pr(w i =1|x i ,z i ) = G(x i ,z i ;γ) does not have to be correctly specified.

G ˆ(b) Technically, α and β are identified even if we do not have extra variables

excluded from x. But can rarely justify the estimator in this ca.

Suppo that w given x follows a probit model (no z). Becau G(x, γ)

= Φ(γ0 +x γ1), is a nonlinear function of x, it is not perfectly correlated

with x, so it ca nbe ud as IV for w.

using G as a regressor in place of w .

y i on 1, and x i . i

G ˆ

Consistency of the OLS estimators from the regression:

(5) i

i i i u x G y +++=00ˆβαδ would rely on G( ) to be correctly specified. Note that (5) also has problems with standard errors that need to be corrected.

recoveryAllow interact term:

()i i i i i i e x x w x w y +−+++=δβαδ00 (6)

Procedure 2:

(a) Estimate Pr(w i =1|x i ,z i ) = G(x i ,z i ;γ)

txu

(b) U 1, and x i , and i G ˆ()x x G i

i −ˆ as IVs. Discussions are the same as before.

4. Regression discontinuity

It is uful to distinguish between two general ttings, the Sharp and the Fuzzy Regression Discontinuity designs. In the sharp design, the assignment w i is a

deterministic function of one of the covariates, the forcing (or treatment-determining) variable x :

Sharp design:

w i = 1(x i > x 0)

All units with x i > x 0 are assigned to the treatment group (and participation is mandatory for the individuals), and all units with x i ≤ x 0 are assigned to the control group. In this sharp design, we look at the discontinuity in the conditional expectation of the outcome given the covariates to uncover the ATE :

)|(]|[lim ]|[lim 0010

0x x y y E x y E x y E ATE x x x x =−=−=−+→→工程拆除

Fuzzy design:

E(w i |x i = x ) = Pr(w i = 1|x ) is discontinuous at known value x 0.

高中英语学习机

The sharp and fuzzy designs differ in that in the sharp design the treatment

assignment is deterministic given x , while the fuzzy design the treatment assignment may depend on additional factors unobrved by econometrician. In both designs, the discontinuity point x 0 is known.

Assumption (RD):

(i) ()x w E w x x |lim 0+→+= and ()x w E w x x |lim 0

−→−= exist.

(ii) w + ≠ w - In Angrist and Lavy (1999), an identifying assumption would be that the class size for a student in a school with a number of pupils approaching (for example) 800 above differs from that of a student in a school with a number of pupils approaching 800 from below.

Assumption: E(y 1i – y 0i |x i = x ) is continuous in x at x 0.

This assumption is valid where we have reason to believe that person clo to threshold c are similar and thus would experience similar outcome abnt treatment.

Theorem: ATE, denoted as α:

−

+−

消防员报考条件

+−−=w w y y α Proof:

Let Δ to be a small positive number.

()()

()()()()

()()()()()(()()()()()()()

Δ−−Δ++Δ−−Δ+=Δ−−Δ++Δ−−−Δ+−=Δ−+−−Δ++−=)Δ−−Δ+00000000000010010001000100||||||||||||x y E x y E x w E x w E x y E x y E x w y y E x w y y E x y w y y E x y w y y E x y E x y E α

As Δ Æ0, we have:

()−+−+−=−w w y y α

Here we u the fact (assumption) that E(y 0) is continuous at x 0 without treatment.

The conclusion follows.

Given this theorem, we can obtain an estimate of α by estimating y +, y -, w +, and w -. There are veral ways to estimate this. The most popular way is to do it non-parametrically.

In practice,

()()()()∑∑∑∑<<−<<−=+<<+<<=−+0000000011ˆ11ˆx x h x x x h x y y

h x x x h x x x y y

i i i i i i

Note for a sharp design RD, w + - w - = 1. For a fuzzy design RD,

()()()()∑∑∑∑<<−<<−=+<<+<<=−+0

0000011ˆ11ˆx x h x x x h x w w h x x x h x x x w w

高贵的意思i i i i i i

where h is the bandwidth. An interesting note is that this is numerically equivalent to an IV estimator for the regression of y i on w i for people in the subsample

using ()h x x h x i +<<−00(h x x x i )+<<001 as the IV. The regression method can be uful becau one can add control variables in the regression.

Practically, for a sharp design,

1. Graph the data by computing the average value of the outcome variable over a t of bins. The bandwidth has to be large enough to have sufficient amount of

precision so that the plots look smooth on either side of the cutoff value, but at the same time small enough to make the jump around the cutoff value clear.

2. Estimate the treatment effect by running linear regression on both sides of the cutoff point. Since we propo to u a rectangular kernel, the are just standard

本文发布于:2023-05-20 09:35:18，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/115644.html

上一篇：随机森林（2）——代码简单实现

下一篇：Protection of soil carbon by microaggregates within earthworm casts

标签：消防员财务会计拆除条件

留言与评论（共有 0 条评论）