1Review of Statistics
Three types of statistical methods are ud throughout econometrics:estimation,hypothesis testing,and confidence intervals.
•Estimation:computing a“best guess”numerical value for an unknown characteristic of a population distribution,such as its mean,from a sample data.
•Hypothesis testing:formulating a specific hypothesis about the population,then using sample evidence to decide whether it is true.
•Confidence intervals:u a t of data to estimate an interval for an unknown population charac-teristic.
•We focus on an unknown population mean and comparing means in different populations.
1.1Estimation of the Population Mean
Suppo you want to know the mean value of Y(µY)in a population–such as the mean earnings of recent college graduates,a natural estimator would be¯Y,the sample average of the obrvations of a random sample,Y1,Y2,...,Y n(independent and identically distributed with meanµY and varianceσ2Y).周佳奇
•An estimator is a function of a sample of data to be drawn randomly from a population.
•An estimate is the numerical value of the estimator when it is actually computed using data from a specific sample.
•The estimator
¯Y=1
n X i=1Y i
•Unbiadness
EY=µY
•Efficiency:Another unbiad estimator would beˆµ1=(Y1+Y2+...+Y n−1)/(n−1),we have
Var(¯Y)<Var(ˆµ1)
手工编织包包
since
Var(¯Y)=σ2Y
and Var(ˆµ1)=
σ2Y
In fact,¯Y has the smallest variance among all unbiad estimators.
•¯Y is the least squares estimator ofµY:consider the problem
min m
n
X i=1(Y i−m)2
and note that
n
X i=1(Y i−m)2=n X i=1(Y i−¯Y)2+n(¯Y−m)2
Thus¯Y solves min m P n i=1(Y i−m)2.
1.2Hypothesis Tests Concerning the Population Mean
Are mean earnings the same for male and female college graduates?The statistical challenge is to answer the questions bad on a sample of evidence.
•Null hypothesis:the starting point of statistical hypothes testing is specifying the hypothesis to be tested,usually denoted as H0.
•Alternative hypothesis:a cond hypothesis with which the null hypothesis is being compared bad on evidence in the data,usually denoted as H1.
•An example:the null hypothesis that on average in the population,college graduates earn$20/hour,
H0:E(Y)=µY,0=20
a two-sided alternative hypothesis
H1:E(Y)=20
•The sample variance and standard deviation
手机为什么会自动关机
s2Y=
1n X i=1(Y i−¯Y)2
and s Y is the sample standard deviation.
—s2Y is an unbiad estimator forσ2,Es2Y=σ2,and n−1is called the degrees of freedom
•The t statistic
t=¯Y−µ
Y,0
¯
with SE(¯Y)=ˆσ¯Y=s Y/√n.
—When Y is normally distributed
Z=Y Y
σY/√n∼N(0,1)
whereas
t=Y Y
ˆσY/√n∼t(n−1)
the Student t distribution with n−1degrees of freedom
—The t-statistic has a t distribution only if the population distribution is normal,which is very often a poor approximation to the actual distribution of economic data.
—The differences between the t distribution and the standard normal distribution are very small if sample size is moderate and are negligible if it is large
•Hypothesis Testing with a prespecified significance level:
—A simple rule
怎样玩魔方Reject H0if|t|>1.96
and note that
Pr(|N(0,1)|>1.96)=5%
therefore,when H0is true(µY=µY0),then
Z=Y Y0
σY/√n=
Y Y
σY/√n+Y
−µY0
σY/√n∼N(0,1)
thus the probability of erroneously rejecting the null hypothesis is5%.
—the significance level of the test is5%–Type I error=5%–rejecting the null hypothesis when it is in fact true
—The critical region of this two-sided test is1.96–namely rejects H0if|t|>1.96.
•One-Sided Alternatives:
H1:E(Y)>µY,0
For example the relevant alternative to the null hypothesis that earnings are the same for college graduates and nongraduates is not just that their earnings differ,but rather that graduates earn more than nongraduates
—The N(0,1)critical value for a one-sided test with a5%significance level is1.645,namely Pr(N(0,1)>1.645)=5%追根问底
•Con fidence interval:Note that
Pr(Y Y σY /√n <1.96)=95%or Pr(|Y −µY |<1.96σY /√n )=95%
or Pr(¯Y −1.96σY /√n <µY <¯Y −1.96σY /√n )=95%thus a 95%con fidence interval for µY (¯Y
−1.96σY /√n,¯Y −1.96σY /√n )an approximation (¯Y −1.96ˆσY /√n,¯Y −1.96ˆσY /√n )
•Consider the problem of constructing a 95%con fidence interval for the mean hourly earnings of recent
college graduates using a hypothetical random sample (n =200)where ¯Y
=$22.64and SE (¯Y )=1.28–The 95%con fidence interval for mean hourly earnings is
22.64±1.96×1.28=($20.13,$25.15)1.3Comparing Means for Di fferent Populations
Question:Do recent male and female college graduates earn the same amount on average?
We will provide a statistical answer.简笔图片
•Hypothesis Tests for the Di fference Between Two Means:
—Null hypothesis
H 0:µm −µw =d 0
vs H 1:µm −µw =d 0
醉酒诗the original question corresponds to d 0=0.
—The estimator for µm −µw ,¯Y
m −¯Y w ,and under normality ¯Y m −¯Y w ∼N (µm −µw ,σ2m n m +σ2w n w
)we also need to estimate σ2m and σ2w –by s 2m and s 2w –and thus the standard error of ¯Y m −¯Y w is SE (¯Y m −¯Y w )=s s 2m n m +s 2w华为定位找回
n w
—construct the t statistic
t=¯Y
m−¯Y w−d0 SE(¯Y m−¯Y w)
—The null hypothesis is rejected at the5%significance level if|t|>1.96,since Pr(|N(0,1)|>
1.96)=5%
—For the one-sided alternative hypothesis H1:µm−µw>d0,the null hypothesis is rejected at the
5%significance level if t>1.645,since Pr(N(0,1)>1.645)=5%.
•Confidence Intervals for the Difference Between Two Population Means
95%confidence interval for d=µm−µw is¯Y m−¯Y w±1.96SE(¯Y m−¯Y w)•Earnings of Male and Female College Graduates in the United States.
—Table3.1gives estimates of hourly earnings for college-educated full-time workers aged25-34in
the U.S.,bad on the data collected as part of the Current Population Survey(CPS),all adjusted
for inflation by putting them in1998dollars using the Consumer Price Index.
—For example,the CPS administered in March1999surveyed64,000houholds,which included
1393men and1210women employed full with a college degree.
—t statistic for testing that the wage gap is zero t=(2.45−0)/0.29=8.45significant at1%level.
—The95%confidence interval is$(2.45±1.96×0.29)=($1.89,$3.02)–with a95%confidence level,
we estimate that the wage gap between the two population is between$1.89and$3.02.•The male-female wage gap is quite large,and it is quite unlikely that this estimated gap is simply an artifact of s
ampling error.
•Then,what is the main cau of this gap?(due to discrimination,or the skill and experiences in men and women?)–We need the tools of multiple regression analysis to attach such questions.
1.4Scatterplots,the Sample Covariance,and the Sample Correlation Question:What is the relationship between age and earnings?
Three ways to summarize the relationship between variables:the scatterplot,the sample covariance,and the sample correlation coefficient.
•A scatterplot is a plot of n obrvations on X i and Y i,in which each obrvation is reprented by the point(X i,Y i).
—This scatter plot shows a positive relationship between age and earnings in this sample