ebalance：基于熵平衡法的协变量平衡性检验

Stata连享会   主页 || 视频 || 推文 || 知乎

New！ `lianxh` 命令发布了：

`. ssc install lianxh`

`. help lianxh`

⛳ Stata 系列推文：

3. Stata 实现

3.1 命令安装和数据介绍

``````. net describe ebalance, from(http://fmwww.bc.edu/RePEc/bocode/e)

. ssc install ebalance, all replace

. net get ebalance.pkg, replace // 下载数据到当前工作路径
``````

``````
. use "cps1re74.dta", clear     // 载入数据
. des

Contains data from cps1re74.dta
obs:        16,177             DW Subset of LaLonde Data
vars:            12             6 Oct 2011 14:57
-------------------------------------------------------------
storage   display
variable name   type    format   variable label
-------------------------------------------------------------
re78            double  %9.0g    real earnings 78
treat           long    %9.0g    1 if treated, 0 control
age             long    %9.0g    age in years
educ            long    %9.0g    years of schooling
black           long    %9.0g    indicator for black
hispan          long    %9.0g    indicator for hispanic
married         long    %9.0g    indicator for married
nodegree        long    %9.0g    indicator for no HS degree
re74            double  %9.0g    real earnings 74
re75            double  %9.0g    real earnings 75
u74             double  %9.0g    indicator for unemployed 74
u75             double  %9.0g    indicator for unemployed 75
-------------------------------------------------------------
``````

``````. use cps1re74.dta, clear
(DW Subset of LaLonde Data)

. reg re78 treat age-u75

Source |       SS           df       MS      Number of obs   =    16,177
----------+----------------------------------   F(11, 16165)    =   1343.88
Model |  7.2418e+11        11  6.5835e+10   Prob > F        =    0.0000
Residual |  7.9190e+11    16,165  48988567.3   R-squared       =    0.4777
Total |  1.5161e+12    16,176  93724175.2   Root MSE        =    6999.2
---------------------------------------------------------------------------
re78 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------+----------------------------------------------------------------
treat |   1067.546    554.060     1.93   0.054      -18.472    2153.564
age |    -94.541      6.000   -15.76   0.000     -106.302     -82.780
educ |    175.225     28.697     6.11   0.000      118.977     231.474
black |   -811.089    212.849    -3.81   0.000    -1228.296    -393.881
hispan |   -230.535    218.610    -1.05   0.292     -659.034     197.965
married |    153.228    142.775     1.07   0.283     -126.626     433.083
nodegree |    342.927    177.878     1.93   0.054       -5.734     691.587
re74 |      0.291      0.013    22.89   0.000        0.266       0.316
re75 |      0.443      0.013    34.35   0.000        0.417       0.468
u74 |    355.556    231.600     1.54   0.125      -98.406     809.519
u75 |  -1612.758    239.803    -6.73   0.000    -2082.798   -1142.717
_cons |   5762.180    445.614    12.93   0.000     4888.726    6635.634
---------------------------------------------------------------------------
``````

3.2 ebalance 的基本语法

`ebalance`的基本语法如下所示：

``````ebalance [treat] [covar], targets(numlist)
``````

``````. ebalance treat age black educ, targets(3) //表示对 age、black 和 educ 三个协变量的一阶、二阶和三阶矩进行调整

Data Setup
Treatment variable:   treat
Covariate adjustment: age black educ (1st order). age black educ (2nd order). age black educ (3rd order).

Optimizing...
Iteration 1: Max Difference = 580799.347
Iteration 2: Max Difference = 213665.688
Iteration 3: Max Difference = 78604.7628
...
Iteration 15: Max Difference = .420310244
Iteration 16: Max Difference = .037151116
Iteration 17: Max Difference = .008791339
maximum difference smaller than the tolerance level; convergence achieved

Treated units: 185     total of weights: 185
Control units: 15992   total of weights: 185

Before: without weighting

|         Treat             |         Control
|  mean  variance  skewness |   mean  variance  skewness
------+---------------------------+----------------------------
age | 25.82     51.19     1.115 |  33.23       122     .3478
black | .8432     .1329    -1.888 | .07354    .06813     3.268
educ | 10.35     4.043    -.7212 |  12.03     8.242    -.4233

After:  _webal as the weighting variable

|         Treat             |        Control
|  mean  variance  skewness |  mean  variance  skewness
------+---------------------------+---------------------------
age | 25.82     51.19     1.115 | 25.75     51.22      1.14
black | .8432     .1329    -1.888 | .8423     .1328    -1.879
educ | 10.35     4.043    -.7212 | 10.35     4.039    -.7224
``````

3.3 平衡性检验

``````. tabstat age [aweight=_webal], by(treat) s(N me v) nototal //检验是否平衡

Summary for variables: age
by categories of: treat (1 if treated, 0 control)

treat |         N      mean  variance
---------+------------------------------
0 |     15992  25.75072  51.22003
1 |       185  25.81622   51.1943
----------------------------------------
``````

3.4 变量交互项的匹配

``````. gen ageXblack = age*black  //设置 age 和 black 的交互项 ageXblack

. ebalance treat age educ black ageXblack, targets(3 2 1 1)  //进行熵平衡

Data Setup
Treatment variable:   treat
Covariate adjustment: age educ black ageXblack (1st order). age educ (2nd order). age (3rd order).

Optimizing...
Iteration 1: Max Difference = 573647.601
Iteration 2: Max Difference = 211032.079
Iteration 3: Max Difference = 77633.2843
...
Iteration 15: Max Difference = .277671087
Iteration 16: Max Difference = .015938697
Iteration 17: Max Difference = .000055717
maximum difference smaller than the tolerance level; convergence achieved

Treated units: 185     total of weights: 185
Control units: 15992   total of weights: 185

Before: without weighting

|         Treat             |         Control
|  mean  variance  skewness |   mean  variance  skewness
----------+---------------------------+----------------------------
age | 25.82     51.19     1.115 |  33.23       122     .3478
educ | 10.35     4.043    -.7212 |  12.03     8.242    -.4233
black | .8432     .1329    -1.888 | .07354    .06813     3.268
ageXblack | 21.91     134.6    -.4435 |  2.402     81.55     3.893

After:  _webal as the weighting variable

|         Treat             |        Control
|  mean  variance  skewness |  mean  variance  skewness
----------+---------------------------+---------------------------
age | 25.82     51.19     1.115 | 25.82     51.19     1.115
educ | 10.35     4.043    -.7212 | 10.35     4.043    -.7231
black | .8432     .1329    -1.888 | .8432     .1322    -1.888
ageXblack | 21.91     134.6    -.4435 | 21.91     133.2    -.4572
``````
``````. bysort black: tabstat age [aweight=_webal], by(treat) s(N me v) nototal //根据 _webal 验证年龄均值在黑人和非黑人之间是平衡的

--------------------------------------------------------
-> black = 0

Summary for variables: age
by categories of: treat (1 if treated, 0 control)

treat |         N      mean  variance
---------+------------------------------
0 |     14816  24.93109  45.28087
1 |        29  24.93103  40.49507
----------------------------------------

--------------------------------------------------------
-> black = 1

Summary for variables: age
by categories of: treat (1 if treated, 0 control)

treat |         N      mean  variance
---------+------------------------------
0 |      1176  25.98077  52.16224
1 |       156  25.98077   53.2835
----------------------------------------
``````

3.5 NSW 实验效果的无偏估计

``````. use cps1re74.dta, clear
(DW Subset of LaLonde Data)

. *协变量组合生成二阶矩及一阶交互项
. foreach v in age educ black hispan married nodegree re74 re75 u74 u75 {
foreach m in age educ black hispan married nodegree re74 re75 u74 u75 {
gen `v'X`m'=`v'*`m'
}
}

. *age、educ、re74 和 re75 作为连续变量，还需设置他们的三阶矩
. foreach v in age educ re74 re75 {
gen `v'X`v'X`v' = `v'^3
}
``````
``````. *进行熵平衡，并将结果保存到 baltable.dta
. ebalance treat-re75Xre75Xre75, keep(baltable) replace

. reg re78 treat [pweight=_webal]

Linear regression               Number of obs     =     16,177
F(1, 16175)       =       5.58
Prob > F          =     0.0182
R-squared         =     0.0161
Root MSE          =     6889.6

---------------------------------------------------------------
|            Robust
re78 |    Coef.  Std. Err.   t   P>|t|   [95% Conf. Interval]
-------+-------------------------------------------------------
treat | 1761.344   745.533  2.36  0.018    300.017    3222.671
_cons | 4587.800   472.243  9.71  0.000   3662.151    5513.449
---------------------------------------------------------------
``````

4. 结论

• 对于虚拟变量及包含虚拟变量的交互项，只需进行一阶矩加权调整即可实现平衡；
• 对于年龄等连续变量，通常选择其一阶、二阶、三阶矩以及一阶与其他协变量一阶矩的交互项进行加权调整；
• 在进行熵平衡之后，权重值自动储存为变量 _webal

5. 参考文献

• Hainmueller J. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies[J]. Political analysis, 2012: 25-46. -PDF-
• Hainmueller J, Xu Y. Ebalance: A Stata package for entropy balancing[J]. Journal of Statistical Software, 2013, 54(7). -PDF-
• Dehejia R H, Wahba S. Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs[J]. Journal of the American statistical Association, 1999, 94(448): 1053-1062. -PDF-
• 张海峰, 梁若冰, 林细细. 子女数量对农村家庭经济决策的影响——兼谈对 “二孩政策” 的启示[J]. 中国经济问题, 2019 (3): 68-80. -Link-

6. 相关推文

Note：产生如下推文列表的 Stata 命令为：
`lianxh 匹配, m`

`ssc install lianxh, replace`

相关课程

最新课程-直播课

• Note: 部分课程的资料，PPT 等可以前往 连享会-直播课 主页查看，下载。

关于我们

• Stata连享会 由中山大学连玉君老师团队创办，定期分享实证分析经验。
• 连享会-主页知乎专栏，400+ 推文，实证分析不再抓狂。直播间 有很多视频课程，可以随时观看。
• 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标，输入简要关键词，以便快速呈现历史推文，获取工具软件和数据下载。常见关键词：`课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法`

✏ 连享会-常见问题解答：
https://gitee.com/lianxh/Course/wikis

New！ `lianxh` 命令发布了：

`. ssc install lianxh`

`. help lianxh`