Stata绘图:随机推断中的系数可视化

发布时间:2020-12-08 阅读 4209

Stata连享会   主页 || 视频 || 推文 || 知乎

温馨提示: 定期 清理浏览器缓存,可以获得最佳浏览体验。

New! lianxh 命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh

课程详情 https://gitee.com/arlionn/Course   |   lianxh.cn

课程主页 https://gitee.com/arlionn/Course

作者:孙碧洋 (香港中文大学)
邮箱sociology.tutor@gmail.com


目录


编者按: 本文主要摘自以下文章,特此感谢!

Source: Marshall A. Taylor, 2020, Visualization strategies for regression estimates with randomization inference, Stata Joural, 20(2): 309–335. -Link-

1. 准备工作

本文介绍了 Marshall A. Tylor 对于随机推断模型 (randomization models of inference) 系数可视化的方法。

原文数据和 Do 文档获取方法:

原文数据初步整理的 Do 文档和数据获取方法:

Note: 关于随机推断方法的介绍,请参考 「Stata:ritest-随机推断(Randomization Inference)」

2. 文章重现

2.1 coefplot 命令回顾

以美国国家健康与营养调查 (the National Health and Nutrition Examination Survey data) 数据集「nhanes2f.dta」为例,我们用 coefplot 命令生成 OLS 系数可视化图。

*数据下载地址:https://gitee.com/arlionn/data/blob/master/data01/nhanes2f.dta
use nhanes2f.dta, clear 
reg bpsystol weight height i.sex i.race age health   //bysystol 指血压收缩压
est store m1 

coefplot m1, mlabcolor(black) ///
    xline(0, lcolor(black)) mcol(black) msize(small) mlabsize(2) ///
    graphregion(fcolor(white) lcolor(white) lwidth(thick)) drop(_cons) ///
    coeflabels(weight="Weight (kg.)" height="Height (cm.)" 2.sex="Female" ///
    2.race="Black" 3.race="Other" age="Age" health="Health") ///
    groups(?.race ?.race = "{bf:Race}") mlabels(weight=12 "0.463" height=12 "-0.315" ///
    2.sex=12 "-2.945" 2.race=12 "2.333" 3.race=12 "2.294" age=12 "0.580" ///
    health=12 "-0.865") levels(99.9 99 95) legend(order(1 "99.9% CI" 2 ///
    "99% CI" 3 "95% CI") rows(1)) ciopts(lcolor(black black black)) ///
    saving(ex1plot.gph, replace)
		
graph export ex1plot.png, replace

      Source |       SS           df       MS      Number of obs   =    10,335
-------------+----------------------------------   F(7, 10327)     =    677.27
       Model |  1771381.43         7  253054.491   Prob > F        =    0.0000
    Residual |  3858591.92    10,327  373.641127   R-squared       =    0.3146
-------------+----------------------------------   Adj R-squared   =    0.3142
       Total |  5629973.35    10,334  544.800982   Root MSE        =     19.33

------------------------------------------------------------------------------
    bpsystol |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |   .4630547   .0143833    32.19   0.000     .4348607    .4912487
      height |  -.3148502   .0309828   -10.16   0.000    -.3755825   -.2541179
             |
         sex |
     Female  |  -2.945127   .5466163    -5.39   0.000      -4.0166   -1.873653
             |
        race |
      Black  |   2.333404   .6300923     3.70   0.000     1.098301    3.568507
      Other  |   2.293741   1.397902     1.64   0.101    -.4464172    5.033899
             |
         age |   .5799269   .0123753    46.86   0.000      .555669    .6041849
      health |  -.8752806    .172614    -5.07   0.000    -1.213637   -.5369237
       _cons |   127.0381   5.328135    23.84   0.000     116.5939    137.4822
------------------------------------------------------------------------------

若代表置信区间的横线穿过了零线,那所对应的系数就不显著异于零。

上图与回归结果相对应,因变量为血压收缩压,自变量为种族、年龄、体重、健康等。可以看出,在控制其他因素不变的情况下,黑人血压平均比白人血压高出 2.333mm/Hg。由于代表 99.9% 置信区间的最细的横线也未横跨过零线,所以该估计结果在 α=0.001 水平上显著。

至此,我们知道 coefplot 可以完美地展现基于随机抽样数据得出的回归分析结果。不过,这次我们是要拓展 coefplot 的应用范围,来展现使用蒙特卡罗置换方法 (Monte Carlo permutation tests) 得出的 p-value 的取值。

2.2 随机推断下的统计估计

随机推断的样本并不是通过简单的随机抽样得到的,而是一些方便取样的结果 (convenience sample)。与常见的估计方法不同,通过随机推断方法得到回归估计结果中没有标准误 (standard errors) 和置信区间。因此,想通过上述做法进行系数可视化并不实际。不过,由于可以得到随机抽样中 p-value 本身的标准误及置信区间,我们可以通过 permute 命令对随机推断下得出的 p-value 可信度有一定掌握。因此,本文中的 PPV 图,就是借助 coefplot 对 p-value 的可信度进行可视化。

2.3 原文案例

文章关注的是哪些社会人口因素可以提升一个人对于 communist 的恐惧感。

数据的样本组成是部分反 communism 培训营 ("anti-Communism schools") 的成员。1953年,来自澳大利亚的 Fred Schwarz 在美国成立了 The Christian Anti-Communism Crusade (CACC) 组织。Fred Schwarz 常常在组种种讲授各种课程,宣扬自己的政治理念。这个组织并没有非常固定的学员,而且这些课程也是时有时无。

尽管后来不少学者对这个组织感兴趣,但组织本身架构松散且聚会不固定,所以学者无法找到一个合适的抽样框对课程学员进行抽样。诸多限制导致学者最后只得到了一个通过立意抽样 (purposive sampling) 法得到的非随机样本。对于这个明显是非随机抽样而得到的样本而言,样本不再可以代表全美 "anticommunists" 这个 population ,因此传统的回归分析不再适用。

我们试图通过这个非随机样本的信息来回答如下问题:(已知前提,学员自我选择进入了 anti-Communism schools,因此他们对于 communism 的基线恐惧值已高于一般人群)在这个学校中,哪些社会人口因素可以导致一个人提升对 communism 的恐惧感?这个恐惧感是因变量,数值越高带说明恐惧感越强。自变量包括政治参与度、政治面貌、教育程度、是否为工会成员、是否为自雇者。

作者先对两个主要变量,即 threat (communist threat) 和 engage 进行主成分分析 (principal component analysis)。这里需要安装 univarpolychoric 两个命令,建议输入 search sg67.1 安装。

*数据下载地址:https://gitee.com/arlionn/data/blob/master/data01/rright.dta
use rright.dta, clear

qui reg cdanger profess cdems cspeech cneighbor clocal
gen flag = e(sample) //通过 e(sample)命令提取参与到上一行回归估计中的样本,标为 flag == 1 

univar cdanger profess cdems cspeech cneighbor clocal if flag==1   

mat define mean = (0.92, 0.88, 0.67, 0.18, 0.51, 0.71) //分别将 cdanger profess ... clocal 的 mean 存入构建的矩阵中
mat define sd = (0.27, 0.33, 0.47, 0.39, 0.50, 0.46)  //存入变量 sd 信息于矩阵中
mat colnames mean = cdanger profess cdems cspeech cneighbor clocal  //对矩阵列进行命名
mat colnames sd = cdanger profess cdems cspeech cneighbor clocal

polychoric cdanger profess cdems cspeech cneighbor clocal if flag==1
global N = r(sum_w)
matrix r = r(R)
factormat r, n(${N}) factors(2) means(mean) sds(sd)  //对变量进行因子分析

rotate
 
predict threat  //得到 threat 变量
. univar cdanger profess cdems cspeech cneighbor clocal if flag==1   
                                        -------------- Quantiles --------------
Variable       n     Mean     S.D.      Min      .25      Mdn      .75      Max
-------------------------------------------------------------------------------
 cdanger     116     0.92     0.27     0.00     1.00     1.00     1.00     1.00
 profess     116     0.88     0.33     0.00     1.00     1.00     1.00     1.00
   cdems     116     0.67     0.47     0.00     0.00     1.00     1.00     1.00
 cspeech     116     0.18     0.39     0.00     0.00     0.00     0.00     1.00
cneighbor     116     0.51     0.50     0.00     0.00     1.00     1.00     1.00
  clocal     116     0.71     0.46     0.00     0.00     1.00     1.00     1.00
-------------------------------------------------------------------------------


. polychoric cdanger profess cdems cspeech cneighbor clocal if flag==1

Polychoric correlation matrix

              cdanger     profess       cdems     cspeech   cneighbor      clocal
  cdanger           1
  profess   .55953723           1
    cdems   .58017075   .73879895           1
  cspeech  -.41531223  -.21315883   -.0107424           1
cneighbor   .23457457   .44627225   .25621176   -.0558755           1
   clocal   .75193374   .42567967   .49687761  -.07875559   .55864975           1


. factormat r, n(${N}) factors(2) means(mean) sds(sd)  //对变量进行因子分析
(obs=116)

Factor analysis/correlation                      Number of obs    =        116
    Method: principal factors                    Retained factors =          2
    Rotation: (unrotated)                        Number of params =         11

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      2.83659      2.21276            0.7113       0.7113
        Factor2  |      0.62384      0.11376            0.1564       0.8678
        Factor3  |      0.51008      0.17286            0.1279       0.9957
        Factor4  |      0.33722      0.48803            0.0846       1.0803
        Factor5  |     -0.15081      0.01843           -0.0378       1.0424
        Factor6  |     -0.16923            .           -0.0424       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(15) =  394.52 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    -------------------------------------------------
        Variable |  Factor1   Factor2 |   Uniqueness 
    -------------+--------------------+--------------
         cdanger |   0.8413   -0.3723 |      0.1535  
         profess |   0.7693    0.0398 |      0.4066  
           cdems |   0.7327    0.0811 |      0.4565  
         cspeech |  -0.2539    0.5240 |      0.6610  
       cneighbor |   0.5313    0.4184 |      0.5426  
          clocal |   0.8083    0.1657 |      0.3193  
    -------------------------------------------------


. rotate

Factor analysis/correlation                      Number of obs    =        116
    Method: principal factors                    Retained factors =          2
    Rotation: orthogonal varimax (Kaiser off)    Number of params =         11

    --------------------------------------------------------------------------
         Factor  |     Variance   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      1.82125      0.18207            0.4567       0.4567
        Factor2  |      1.63918            .            0.4111       0.8678
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(15) =  394.52 Prob>chi2 = 0.0000

Rotated factor loadings (pattern matrix) and unique variances

    -------------------------------------------------
        Variable |  Factor1   Factor2 |   Uniqueness 
    -------------+--------------------+--------------
         cdanger |   0.3667    0.8438 |      0.1535  
         profess |   0.5929    0.4918 |      0.4066  
           cdems |   0.5940    0.4367 |      0.4565  
         cspeech |   0.1681   -0.5574 |      0.6610  
       cneighbor |   0.6743    0.0521 |      0.5426  
          clocal |   0.7068    0.4256 |      0.3193  
    -------------------------------------------------

Factor rotation matrix

    --------------------------------
                 | Factor1  Factor2 
    -------------+------------------
         Factor1 |  0.7356          
         Factor2 |  0.6774  -0.7356 
    --------------------------------


. predict threat  //得到 threat 变量
(option regression assumed; regression scoring)

Scoring coefficients (method = regression; based on varimax rotated factors)

    ----------------------------------
        Variable |  Factor1   Factor2 
    -------------+--------------------
         cdanger | -0.52665   1.09069 
         profess |  0.33937   0.05147 
           cdems |  0.23050  -0.03128 
         cspeech |  0.09328  -0.12611 
       cneighbor |  0.17011   0.00549 
          clocal |  0.75613  -0.41386 
    ----------------------------------

重复上述步骤得到政治参与度变量,结果不再重述。

*重复上述步骤得到政治参与度变量
qui reg meeting letter org influence money work
gen flag2 = e(sample)

univar meeting letter org influence money work if flag2==1

mat define mean2 = (0.61, 0.61, 0.28, 0.92, 0.63, 0.39)
mat define sd2 = (0.49, 0.49, 0.45, 0.28, 0.48, 0.49)
mat colnames mean2 = meeting letter org influence money work
mat colnames sd2 = meeting letter org influence money work

polychoric meeting letter org influence money work if flag2==1
global N = r(sum_w)
matrix r = r(R)
factormat r, n(${N}) factors(2) means(mean2) sds(sd2)

rotate
predict engage  //得到 engage 变量

作者先用 OLS 对 threat 进行回归,之后根据结果画出了下面简单的 PPV 可视图。

*Plot 2
qui reg threat engage i.party i.sex age ib2.school i.union i.relig income i.industry i.selfemp
gen flag3 = e(sample)
	
egen threat2 = std(threat) if flag3==1
egen engage2 = std(engage) if flag3==1

mat p_model = J(12, 3, .)   //建立 12*3 矩阵,命名为 p_model

qui reg threat2 engage2 i.party i.union ib2.school i.selfemp if flag3==1
mat def beta = e(b)

forvalues k = 1/12 {
	global k`k' = round(beta[1,`k'], .001)
	}
permute threat2 _b, reps(1000) seed(5): qui reg threat2 engage2 i.party i.union ib2.school i.selfemp if flag3==1 //1000次随机置换

mat def m1 = r(p) //将各个系数的 p-value 存入 m1 矩阵
mat def m12 = r(ci)  //讲各个 p-value 的置信区间存入 m12 矩阵
forvalues k = 1/12 {    //将 m1 m12 中的值逐一放入之前构建的 12*3 的 p_model 矩阵中
	mat p_model[`k',1] = m1[1,`k']
	mat p_model[`k',2] = m12[1,`k']
	mat p_model[`k',3] = m12[2,`k']
	}

mat rownames p_model = engage2 0.party 1.party 2.party 0.union 1.union 1.school 2.school 3.school 0.selfemp 1.selfemp constant
coefplot (matrix(p_model[,1]), ci((p_model[,2] p_model[,3]))), ///
    mlabels(engage2=12 ${k1} 1.party=12 ${k3} 2.party=12 ${k4} 1.union=12 ${k6} ///
    1.school=12 ${k7} 3.school=12 ${k9} 1.selfemp=12 ${k11} constant=12 ${k12}) mlabcolor(black) ///
    xline(0.05, lcolor(black)) mcol(black) msize(small) mlabsize(2) ///
    graphregion(fcolor(white) lcolor(white) lwidth(thick)) drop(0.party 0.union 2.school 0.selfemp) ///
    coeflabels(engage2="Pol Engage (std)" 1.party="Republican" ///
    2.party="Other" 1.union="Union Member" 1.school="Grammar School" ///
    3.school="College" 1.selfemp="Self-Employed" constant="Constant") legend(order(1 "95% Confidence Interval")) ///
    ciopts(lcolor(black)) groups(1.party 2.party = "{bf:Party}" ///
    1.school 3.school="{bf:Schooling}") xtitle("{it:P}-Values") ///
    order(engage2 ?.selfemp ?.party ?.union ?.school constant) ///
    saving(ex2plot.gph, replace)

与一般系数图不同的地方有两个:

  • 横轴 x 只可在[0, 1]区间内取值,且对应的不再是系数本身的数值,而是系数做对应的 p-value 的取值;
  • 竖线不再是  x=0,而是一条 α=0.05 的标记线。若任何图标完全坐落于这条线的左边,则代表 p-value 小于0.05,即所对应的系数取值至少在 α=0.05 水平上显著。

与之前 coefplot 相同的地方是,图标上面的数字代表的依旧是系数值本身。

以 self-employment 的系数 (βSE=0.175) 来举例。我们对比的是自雇人士以及其他人在对 communist threat 上感知程度的截距位置。在这里 p-value 为 0.547 ,说明在进行 1000 次随机置换中,有 54.7% 的随机置换都得到这个系数值的绝对值是大于观测到的 -0.175 的绝对值。因此,我们可以说,至少在 α=0.05 的水平上,-0.175 很可能只是个随机出现的结果而已。再者,我们可以从图上看到,代表这个系数的图标在 α=0.05 这条线非常靠右的位置,说明这里的 p-value 本身也很不显著 (也可以说明 p-value 大概率是大于 0.05 的)。

mat p_model2 = J(22, 3, .)

qui reg threat2 engage2 i.party i.union ib2.school i.selfemp if flag3==1
mat def beta2 = e(b)

forvalues i = 1/22 {
    global i`i' = round(beta2[1,`i'], .001)
}

permute threat2 _b, reps(1000) seed(5):  ///
qui reg threat2 engage2 i.party i.union ib2.school i.selfemp if flag3==1

mat rownames p_model2 = engage2 0.party 1.party 2.party 0.union 1.union ///
    1.school 2.school 3.school 0.sex 1.sex age 0.relig2 1.relig2 income 1.industry ///
    2.industry 3.industry 4.industry 0.selfemp 1.selfemp constant
Monte Carlo permutation results                 Number of obs     =         56

      command:  regress threat2 engage2 i.party i.union ib2.school i.selfemp
  permute var:  threat2

------------------------------------------------------------------------------
T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf. Interval]
-------------+----------------------------------------------------------------
     engage2 |  -.2359895      91    1000  0.0910  0.0091  .0738991   .1105515
             |
       party |
          0  |  (empty)                                             
          1  |  -.5252173     377    1000  0.3770  0.0153  .3468624   .4078642
          2  |  -.6696606     328    1000  0.3280  0.0148   .298944   .3580727
             |
       union |
          0  |  (empty)                                             
          1  |   .2684445     469    1000  0.4690  0.0158  .4377001   .5004829
             |
      school |
          1  |   .1553839     839    1000  0.8390  0.0116  .8147291    .861255
          2  |  (empty)                                             
          3  |   -.579948     133    1000  0.1330  0.0107   .112558   .1556293
             |
     selfemp |
          0  |  (empty)                                             
          1  |  -.1750875     547    1000  0.5470  0.0157  .5155473   .5781752
             |
       _cons |   1.020812     114    1000  0.1140  0.0101  .0949613   .1353439
------------------------------------------------------------------------------
Note: Confidence intervals are with respect to p=c/n.
Note: c = #{|T| >= |T(obs)|}

上述结果继续用同样的方法画出来。

coefplot (matrix(p_model[,1]), mlabels(engage2=11 ${k1} 1.party=11 ${k3} ///
	2.party=11 ${k4} 1.union=11 ${k6} 1.school=11 ${k7} 3.school=11 ${k9} ///
	constant=11 ${k12} 1.selfemp=11 ${k11}) mlabcolor(black) xline(0.05, lcolor(black)) ///
	xline(0.1, lcolor(black) lpattern(dash)) mcol(black) ///
	msize(small) mlabsize(2) ci((p_model[,2] p_model[,3])) ciopts(lcolor(black))) ///
	(matrix(p_model2[,1]), mlabels(engage2=1 ${i1} 1.party=1 ${i3} ///
	2.party=1 ${i4} 1.union=1 ${i6} 1.school=1 ${i7} 3.school=1 ${i9} ///
	1.sex=1 ${i11} age=1 ${i12} 1.relig2=1 ${i14} income=1 ${i15} ///
	1.industry=1 ${i16} 2.industry=1 ${i17} 4.industry=1 ${i19} constant=1 ${i22} ///
	1.selfemp=1 ${i21}) mlabcolor(gray) mcol(gray) msize(small) mlabsize(2) ///
	ci((p_model2[,2] p_model2[,3])) ciopts(lcolor(gray))), graphregion(fcolor(white) lcolor(white) ///
	lwidth(thick)) drop(0.party 0.union 0.sex 0.relig2 2.school 3.industry 0.selfemp) ///
	coeflabels(engage2="Pol Engage (std)" 1.party="Republican" 2.party="Other" ///
	1.union="Union Member" 1.school="Grammar School" 3.school="College" ///
	1.sex="Female" age="Age" 1.relig2="Protestant" income="Family Income" ///
	1.industry="Agriculture" 2.industry="Business" 4.industry="Government" constant="Constant" ///
	1.selfemp="Self-Employed") groups(1.party 2.party = "{bf:Party}" 1.school 3.school = "{bf:Schooling}" ///
	1.industry 2.industry 4.industry = "{bf:Industry}") ///
	legend(order(2 "Model 1" 3 "Model 2")) xtitle("{it:P}-Values") ///
	order(engage2 ?.selfemp ?.party ?.union ?.school ?.sex age ?.relig2 income ?.industry ///
	constant) saving(ex3plot.gph, replace)

在之前模型的基础上,我们又加入了几个新的变量,包括性别,年龄,收入等级,是否为新教徒等等,使得两个模型呈嵌套关系。尽管加入了更多变量,依旧没有任何变量出现在 α=0.05 这条线的左边,说明还是没有变量达到 95% 显著。不过这次我们加入了 α=0.1 的这条线,而且发现有一个变量 engage2 出现在了 α=0.1 的左边,说明估计出的 βPE2=0.281 大概率不是一个随机出现的数值。

我们也可以把上图通过直方图形式表现出来。

#delimit ;

coefplot (matrix(p_model[,1]), ci((p_model[,2] p_model[,3])) bcolor(black))  ///
    (matrix(p_model2[,1]), ci((p_model2[,2] p_model2[,3])) bcolor(gray)),  ///
    graphregion(fcolor(white) lcolor(white) lwidth(thick)) drop(0.party 0.union  ///
    0.sex 0.relig2 2.school 3.industry 0.selfemp) coeflabels(   ///
    engage2=`""{bf:Pol Engage (std)}" "{&beta}1 = ${k1}" "{&beta}2 = ${i1}""'  ///
    1.party=`""{bf:Republican}" "{&beta}1 = ${k3}" "{&beta}2 = ${i3}""'  ///
    2.party=`""{bf:Other}" "{&beta}1 = ${k4}" "{&beta}2 = ${i4}""'  ///
    1.union=`""{bf:Union Member}" "{&beta}1 = ${k6}" "{&beta}2 = ${i6}""'  ///
    1.school=`""{bf:Grammar School}" "{&beta}1 = ${k7}" "{&beta}2 = ${i7}""'  ///
    3.school=`""{bf:College}" "{&beta}1 = ${k9}" "{&beta}2 = ${i9}""'  ///
    1.sex=`""{bf:Female}" "{&beta}2 = ${i11}"'  ///
    age=`""{bf:Age}" "{&beta}2 = ${i12}"'  ///
    1.relig2=`""{bf:Protestant}" "{&beta}2 = ${i14}"'  ///
    income=`""{bf:Family Income}" "{&beta}2 = ${i15}"'  ///
    1.industry=`""{bf:Agriculture}" "{&beta}2 = ${i16}"'  ///
    2.industry=`""{bf:Business}" "{&beta}2 = ${i17}"'  ///
    4.industry=`""{bf:Government}" "{&beta}2 = ${i19}"'  ///
    1.selfemp=`""{bf:Self-Employed}" "{&beta}1 = ${k11}" "{&beta}2 = ${i21}"'  ///
    constant=`""{bf:Constant}" "{&beta}1 = ${k12}" "{&beta}2 = ${i22}"', labsize(vsmall) angle(90))  ///
    groups(1.party 2.party = "{bf:Party}" 1.school 3.school = "{bf:Schooling}"  ///
    1.industry 2.industry 4.industry = "{bf:Industry}") order(engage2 ?.selfemp ?.party  ///
    ?.union ?.school ?.sex age ?.relig2 income ?.industry constant)  ///
    ciopts(lcolor(black) recast(rcap rcap))  ///
    recast(bar) vertical yline(0.05, lcolor(black)) yline(0.1, lcolor(black) lpattern(dash))   ///
    ytitle("{it:P}-Value") barwidth(.3) fcolor(*.8) citop xlabel(, labsize(vsmall)  ///
    angle(90)) legend(order(1 "Model 1" 3 "Model 2")) saving(ex4plot.gph, replace) ;  ///
	
#delimit cr

PPV 也可以通过图标的形状来代表估计系数的符号。下图中,方形代表  ,圆形代表 + 。同时,也可用图标颜色深浅来达标效应值 (effect size),颜色越深对应的效应值越大。下图展示的是标准化后的效应值。

mat p_model3 = J(24, 3, .)

permute threat2 _b, reps(1000) seed(5): ///
qui reg threat2 c.engage2##i.selfemp i.party i.union ib2.school i.sex age i.relig2 c.income ///
    ib3.industry if flag3==1
mat def mm1 = r(p)
mat def mm12 = r(ci)
forvalues i = 1/24 {
    mat p_model3[`i',1] = mm1[1,`i']
    mat p_model3[`i',2] = mm12[1,`i']
    mat p_model3[`i',3] = mm12[2,`i']
}

mat rownames p_model3 = engage2 0.selfemp 1.selfemp 0.selfemp#c.engage2 ///
    1.selfemp#c.engage2 0.party 1.party 2.party 0.union 1.union ///
    1.school 2.school 3.school 0.sex 1.sex age 0.relig2 1.relig2 income ///
    1.industry 2.industry 3.industry 4.industry constant
	
qui reg threat2 c.engage2##i.selfemp i.party i.union ib2.school i.sex age i.relig2 c.income ///
    ib3.industry if flag3==1, b
qui listcoef
mat list r(table)

#delimit ;	
	
coefplot 

    (matrix(p_model3[,1]), keep(engage2)
    mcol(black*.72) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.72))) 
    (matrix(p_model3[,1]), keep(1.selfemp)
    mcol(black*.16) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.16)))
    (matrix(p_model3[,1]), keep(1.selfemp#c.engage2)
    mcol(black*.61) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.61)))
    (matrix(p_model3[,1]), keep(1.party)
    mcol(black*.60) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.60)))
    (matrix(p_model3[,1]), keep(2.party) 
    mcol(black*.54) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.54)))
    (matrix(p_model3[,1]), keep(1.union)
    mcol(black*.003) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.003)))
    (matrix(p_model3[,1]), keep(1.school)
    mcol(black*.08) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.08)))
    (matrix(p_model3[,1]), keep(3.school)
    mcol(black*.23) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.23)))
    (matrix(p_model3[,1]), keep(1.sex) 
    mcol(black*.12) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.12)))
    (matrix(p_model3[,1]), keep(age) 
    mcol(black*.01) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.01)))
    (matrix(p_model3[,1]), keep(1.relig2) 
    mcol(black*.10) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.10)))
    (matrix(p_model3[,1]), keep(income)
    mcol(black*.08) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.08)))
    (matrix(p_model3[,1]), keep(1.industry) 
    mcol(black*.06) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.06)))
    (matrix(p_model3[,1]), keep(2.industry)
    mcol(black*.06) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.06)))
    (matrix(p_model3[,1]), keep(4.industry)
    mcol(black*.10) msymbol(square) xline(0.05, lcolor(black))
    ci((p_model3[,2] p_model3[,3])) ciopts(lcolor(black*.10))),

    graphregion(fcolor(white) lcolor(white) lwidth(thick))
    coeflabels(engage2="Pol Engage (std)" 1.selfemp="Self-Employed"
    1.selfemp#c.engage2="Self-Emp*Pol-Eng" 1.party="Republican" 2.party="Other"
    1.union="Union Member" 1.school="Grammar School" 3.school="College"
    1.sex="Female" age="Age" 1.relig2="Protestant" income="Family Income"
    1.industry="Agriculture" 2.industry="Business" 4.industry="Government") 
    groups(1.party 2.party = "{bf:Party}" 1.school
    3.school = "{bf:Schooling}" 1.industry 2.industry 4.industry = "{bf:Industry}")
    legend(order(2 "Negative Standardized Estimate" 6 "Positive Standardized Estimate")
    cols(1)) 
    xtitle("{it:P}-Values")
    order(engage2 ?.selfemp 1.selfemp#c.engage2 ?.party ?.union ?.school ?.sex 
    age ?.relig2 income ?.industry) saving(ex52plot.gph, replace) ;
	
#delimit cr	

PPV 图像也可以用于展示 p-value 的预测值,有兴趣的同学可以自行阅读。

3. 总结

随机推断可以对不是随机抽样的样本进行分析,然后通过 coefplot 实现的 PPV 作图技术可以很直观的把随机推断对于 p 值的估计结果表达出来。

在 PPV 中,我们不再汇报系数的标准误及置信区间,转而通过展示 p 值的标准误及置信区间对结果有更好的掌握。同时,需要对比的也不再是零线 x=0,而是一条我们自己定义的 α=0.05 或者 α=0.1 的标准线。

4. 参考文献和相关推文

相关课程

连享会-直播课 上线了!
http://lianxh.duanshu.com

免费公开课:


课程一览

支持回看

专题 嘉宾 直播/回看视频
最新专题 因果推断, 空间计量,寒暑假班等
数据清洗系列 游万海 直播, 88 元,已上线
研究设计 连玉君 我的特斯拉-实证研究设计-幻灯片-
面板模型 连玉君 动态面板模型-幻灯片-
面板模型 连玉君 直击面板数据模型 [免费公开课,2小时]

Note: 部分课程的资料,PPT 等可以前往 连享会-直播课 主页查看,下载。


关于我们

  • Stata连享会 由中山大学连玉君老师团队创办,定期分享实证分析经验。直播间 有很多视频课程,可以随时观看。
  • 连享会-主页知乎专栏,300+ 推文,实证分析不再抓狂。
  • 公众号推文分类: 计量专题 | 分类推文 | 资源工具。推文分成 内生性 | 空间计量 | 时序面板 | 结果输出 | 交乘调节 五类,主流方法介绍一目了然:DID, RDD, IV, GMM, FE, Probit 等。
  • 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标,输入简要关键词,以便快速呈现历史推文,获取工具软件和数据下载。常见关键词:课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法

连享会主页  lianxh.cn
连享会主页 lianxh.cn

连享会小程序:扫一扫,看推文,看视频……

扫码加入连享会微信群,提问交流更方便

✏ 连享会学习群-常见问题解答汇总:
https://gitee.com/arlionn/WD

New! lianxh 命令发布了:
随时搜索连享会推文、Stata 资源,安装命令如下:
. ssc install lianxh
使用详情参见帮助文件 (有惊喜):
. help lianxh