logit回归模型假设_Logistic回归模型及应⽤建模(⼆)
⼆、因变量多分类 logistic
回归
1、概述:多元Logistic回归模型被⽤来建⽴有多个输出变量的模型,且这些预测变量通过⼀个线性组合变成为⼀个最终的预测变量。Multinomial
Logistic 回归模型中因变量可以取多个值.
所需的应⽤包:
l ibrary(foreign)
library(nnet)
2011河北中考数学library(ggplot2)
library(reshape2)
2、建模例⼦:
Example . Entering high school students make program
choices among general program, vocational program and academic
program. Their choice might be modeled using their writing score
and their social economic
status.
例⼦:进⼊⾼中的学⽣在做计划选择时,⾯临着⼀般计划、职业计划及学术计划三种选择。他们的选择往往是通过对他们的写作成绩及社会经济地位进⾏模型化决定。
1)读取数据:
ml
read.dta("www.ats.ucla.edu/stat/data/hsbdemo.dta")
这个数据集包含了有200个学⽣的记录,结果变量为prog(program
type)三种项⽬类型,预测变量为s(social economic
status)社会经济地位,3个分类变量(id,female,schtyp),及其他连续变量。
2)数据初步探索:
with(ml,
table(s, prog))同声传译收入
prog
s
general
academic vocation
low
16
19
12
middle
20
44
31the zodiac
high
9
42
7
按照prog分组统计写作得分的均值及标准⽅差
with(ml, do.call(rbind, tapply(write, prog, function(x) c(M =
mean(x), SD = sd(x)))))
M
SD
general 51.33333 9.397775
academic 56.25714 7.943343
vocation 46.76000 9.318754
3)建⽴模型:
利⽤nnet包中的函数multinom,建⽴多元logistic回归模型:
Before
running our model. We then choo the level of our outcome that we
wish to u as our baline and specify this in
therelevelfunction.
Then, we run our model usingmultinom.
adcenterThemultinompackage
does not include p-value calculation for the regression
coefficients, so we calculate p-values using Wald tests (here
z-tests).
在建⽴模型前,我们选择输出变量的⽔平,并利⽤函数r elevel设置prog为哑变量(虚拟变量),以academic作为参考⽔平:ml$prog2
"academic")
t est
~ s + write, data = ml)
summary(test)
结果显⽰:
Call:
multinom(formula = prog2 ~ s + write, data =
ml)
Coefficients:
(Intercept) smiddle
shigh
write
general 2.852198
-0.5332810 -1.1628226 -0.0579287
vocation 5.218260
0.2913859 -0.9826649 -0.1136037
Std. Errors:
(Intercept) smiddle shigh
write
general 1.166441
0.4437323 0.5142196 0.02141097
vocation 1.163552
0.4763739 0.5955665 0.02221996
Residual Deviance: 359.9635
AIC: 375.9635
z
summary(test)$coefficients/summary(test)$s (Intercept) smiddle
shigh
write
yep
general 2.445214
-1.2018081 -2.261334 -2.705562
vocation 4.484769
0.6116747 -1.649967 -5.112689
2-tailed z
test
p
-
pnorm(abs(z),0, 1)) * 2
(Intercept)
smiddle
shigh
write
general 0.0144766100 0.2294379
0.02373856 6.818902e-03
vocation 0.0000072993 0.5407530 0.09894976
3.176045e-07
T he model summary output
has a block of coefficients and a block of standard errors.
模型的结果输出为⼀系列的系数及相应的标准差,每⼀⾏参数都反应了⼀个模型的等式。Each of the blocks has one row of values corresponding to a model
equation. Focusing on the block of coefficients, we can
look at the first row
comparing p rog =
"general"to our
balineprog =
"academic"and the cond row
comparingprog =
"vocation"to our
balineprog =
"academic". If we consider our coefficients from the
first row to be b_1 and our coefficients from the cond row to be
b_2, we can write our model equations:
\[ln\left(\frac{P(prog=general)}{P(prog=academic)}\right)
= b_{10} + b_{11}(s=2) + b_{12}(s=3) +
factor
b_{13}write\]
模型公式⼀
\[ln\left(\frac{P(prog=vocation)}{P(prog=academic)}\right)
= b_{20} + b_{21}(s=2) + b_{22}(s=3) + b_{23}write\]
模型公式⼆
统招本科什么意思## extract the
挥着翅膀的女孩英文版歌词coefficients from the model and exponentiate
exp(coef(test))
(Intercept) smiddle
shigh
write
general 17.32582
0.5866769 0.3126026 0.9437172
vocation 184.61262 1.3382809 0.3743123
0.8926116
解释:
The relative risk ratio for a one-unit
increa in the
variable write is
.9437 for being in general program vs. academic
program.
The relative risk ratio switching
from s= 1 to 3 is .3126 for being
in general program vs. academic program.
Y ou can also u predicted
probabilities to help you understand the model.
You can calculate predicted probabilities for
each of our outcome levels using
thefittedfunction.
We can start by generating the predicted probabilities for the obrvations in our datat and viewing the first few
杭州一对一辅导
rows
head(pp
a cademic
general vocation
1 0.1482764 0.3382454 0.5134781
2 0.1202017 0.180628
3 0.6991700
3 0.4186747 0.2368082 0.3445171
4 0.172688
5 0.3508384 0.4764731
sofa怎么读