首页 > 英语园地

stata数据整理常用命令

更新时间:2023-06-25 00:42:35 阅读：评论：0

Stata常用指令解释

t more off

t virtual on 把虚拟内存打开

di exp(3.567) = display

Brow the data

anytime

tabmiss x1 x2 (findit tabmiss) 显示MV的freq与比例

brow var1 var2 (if ….) Look like editor window, but cannot edit listblck in 1/10, repeat(1) (findit listblck) list, 但将版面精缩

repeat(1/n) => 前1(n)个重复出现after row 2

(findit univar) univar chine math science, boxplot , by(gender) onehdr univar math, by(gender) onehdr boxplot onescal univar (=sum) 但增加q25, midian, q75的呈现get a table with one header

onescale才能相比较

Summary Statistics & Tables

sum

we can u if : eg. (if crime==1) Summarize all variables (mean, SD, freqency)

tab x1, sort miss

(sort=按照distribution排列; miss=列出MV distribution as

well)

tab=tabulate

ta x1 x2, chi2 miss

, nof column (no frequency / column percentage)

, row (row percentage)

, all (all available statistics)

exact (Fisher’s exact test)

Chi2=Pearson chi-square test of independence

ta maage_group, plot

tab1 x1 x2 x3 x4 = tab x1 / tab x2…….

tab2 x1 x2 x3 x4 tab all possible two-way..

ta paedu, sum(crime) By levels of paedu, summarize crime tabstat score, stats(mean sd n max min…) by (subject) median, p10, p25, iqr, q…背单词方法

iqr=interquaritile range=p75-p25

q=quartiles= if we specify p25 p50 p75

table x1 x2, contents(mean y1 median y2) Also min, max….etc…..

Data Management

gen id=_n (then do something el)

sort id

If want to come back to the earlier order….. brow var1 var2 (if ….) Look like editor window, but cannot edit edit var1 var2 var3 (if…)

label variable bw “birth weight”

drop if id==id[_n-1] & birthday==birthday[_n-1] Or just replace delete=1, 就不用真的delete format id %9.0f 字符太多不够显现时….

encode region, gen(region2)

tab region2 (looks the same but…)

tab region2, nolabel (now we e the numeric value) It generate labeled- numeric var from a string variable.

mvdecode mvencode numeric value => mv mv => numeric value

egen zscore=std(x) 标准分数 (mean=0, V=1)

egen avg=rmean(Chine, English, math) Row mean, ignore MV

egen sum=rsum(x,y,z) Row sum, MV=0

list population region, nolabel

(only for lebeled numeric variables, not string var)

Display numeric var instead of the labels [分组]

sort var

会计凭证怎么做gen varnew=group(5)

分成相同cas五组

egen iicat=cut(ii), at(10, 40, 70, 90)

table iicat, contents(min ii max ii) => 检查分成10, 40, 70三组不包括上限 (eg.90) 不被包括者 => MVupstairs

egen iicat=cut(ii), at(10, 40, 70, 90) icodes egen iicat=cut(ii), at(10, 40, 70, 90) label => 变成 0, 1, 2 三组

=> 跟icodes一样，但加了label (10- 40- 70- )

期间费用包括哪些local x "st2 st3 " [for later u: type `x']定义长字符串

Importing data from other programs

infile str30 place population x score using test.raw String var之前要加str#, as many as

#characters

(clean Excel data following stata data format) Excel => stata data

(save Excel as .csv file )

insheet using “c:/data/test.csv”

infix

reshape?

collap?

Compare groups

ttest college, by(male)

Regression

by region3, sort:reg score paedu sort region3

by region3:reg score paedu reg y x1 x2 x3, beta standardized regression

sw reg Y x1 x2 x3 x4 x5….., pr(.05)

pr=p to retain (backward elimination) Stepwi reg:

它自己remove不显著Xs

sw reg Y x1 x2 x3 x4 x5….., pe(.05) pe=p to enter

After regression…

predict yhat

predict e, risid

sort e

list v1 v2 v3… in 1/10 (or in -10/l) (l=last, not one)Residual

We can examine where the model fits poorly…

lstat ? correct classification rate Listcoef, help (要arch & install: Long’s

spostdo)

列出 X(&Y)的标准化系数

After logistic regression

est store full

quietly logistic y x (nested model)

lrtest full

Likelihood-ratio test :

logit y x

predict phat

graph twoway connected phat x, sort predict q, xb => Phat=predicted p

show off=exp(a+bx)/[1+exp(a+bx)] => xb = lg odd = ln(p/(1-p))

predict phat

green parkgraph twoway mspline phat x2

冰川时代1电影

adjust, by(var1) exp 后者=前者

*exp(b)

adjust, by(var1) pr p/(1-p)=odds (when var1=n) => odds when var1=1,2,3.. => p(y) when var1=1,2,3..

Interaction term的诠释: B1(Main)+B2(dummy)

For the group (dummy=1): the odds ratio of Main is

exp(B1) * exp(B2)

logistic y var1 var2 inter

lincom var1+inter

lincom [2]lbw+[2]inter10, or (for mlogit) ([2]=model) Get point estimation & CI of coefficient combination

用方便的方式得到 predicted probability

prchange (findit

spost)

prchange, fromto help (help: add 说

明)

Changes in predicted probability

prtab

prtab, x(paedu=1 maedu=1) rest(min)

Predicted probability in n*n table

prgen ii, f(30) t(60) gen(ff) x(male=0)

prgen ii, f(30) t(60) gen(mm) x(male=1) twoway (connected ffp1 ffx) (connected mmp1 mmx) 连续变项对y=1的影响（于范围内自动取n[default=11]点来计算p）

xi3: logit y i.x1*male

postgr3 male, by(x1) table (very uful for obtain p)dohomework

postgr3 ii, by(area) (连续变项也可以) 有interaction term时……

=> male effect 因x1类别而不同

mlogit

mlogit y x1 x2, rrr nolog ba(2)

(ref group=> y=2)

rrr=relative risk ratio (=OR) Output

outreg using test.doc, nolabel replace (findit outreg) & install Then convert text into table 储存时要click no另存新檔

outreg using test.doc, nolabel append append = model 2 add on M1

outreg var1 var2 using test.xls, replace 10pct coefastr (=st. error instead of t statistics) 可指定列出哪些系数

(+ p<.1) (* add on coef)

log using myfile.smcl, replace (don’t u t) 最后：log2html myfile.smcl, replace (先 findit log2html) => 可以把结果存成html

Graph

不死之药

graph dir List all the graph files graph u gender_gap

graph save filename i.e., filename.gph is saved era filename.gph

其它

sgmediation var_y, mv(varx1) iv(varx2)

[Sobel-Goodman tests: u findit first] test whether a mediator carries the influence of an IV to a DV.

省时

program define shortcut

command 1 … command 2

end

shortcut (自己跑一遍command 1, 2..) Shortcut=program name we t =>shortcut 本身变成command

超级常用

list, gen, recode, replace, rename, sort, drop, keep,

order……

merge, append _merge=1 (from master data), 2=from using

data…

本文发布于:2023-06-25 00:42:35，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/1032530.html

上一篇：Logistic regression with an auxiliary data source

下一篇：Person identification in webcam images An application of mi-supervid learning. ICML 200

标签：系数范围包括显现电影解释

留言与评论（共有 0 条评论）