Stata：虚拟变量交乘项生成和检验的简便方法

Stata连享会   主页 || 视频 || 推文 || 知乎

New！ `lianxh` 命令发布了：

`. ssc install lianxh`

`. help lianxh`

⛳ Stata 系列推文：

1. 简介：虚拟变量（Dummy variables）和交乘项(Interaction)

例： 探究婚姻对女性工资造成的结构性的差异

``````sysuse nlsw88.dta, clear
sum
``````
``````    Variable |   Obs    Mean     Std. Dev.      Min        Max
-------------+------------------------------------------------
idcode | 2,246  2612.654   1480.864         1       5159
age | 2,246  39.15316   3.060002        34         46
race | 2,246  1.282725   .4754413         1          3
married | 2,246  .6420303   .4795099         0          1
never_marr~d | 2,246  .1041852   .3055687         0          1
-------------+------------------------------------------------
grade | 2,244  13.09893   2.521246         0         18
collgrad | 2,246  .2368655   .4252538         0          1
south | 2,246  .4194123   .4935728         0          1
smsa | 2,246  .7039181   .4566292         0          1
c_city | 2,246  .2916296   .4546139         0          1
-------------+------------------------------------------------
industry | 2,232  8.189516   3.010875         1         12
occupation | 2,237  4.642825   3.408897         1         13
union | 1,878  .2454739   .4304825         0          1
wage | 2,246  7.766949   5.755523  1.004952   40.74659
hours | 2,242  37.21811   10.50914         1         80
-------------+------------------------------------------------
ttl_exp | 2,246  12.53498   4.610208  .1153846   28.88461
tenure | 2,231   5.97785   5.510331         0   25.91667
``````

2. 基础模型（Basic Model）

2.1 添加虚拟变量及交乘项的传统方法

``````	gen marriedtenure = married*tenure
gen marriedhours = married*hours
gen marriedttl = married*ttl_exp
reg wage tenure hours ttl_exp married*
test marriedtenure marriedhours marriedttl
``````

``````      Source |       SS           df       MS
-------------+----------------------------------
Model |  6140.31754         7  877.188219
Residual |  67880.4931     2,219  30.5905782
-------------+----------------------------------
Total |  74020.8106     2,226  33.2528349

Number of obs   =     2,227
F(7, 2219)      =     28.68
Prob > F        =    0.0000
R-squared       =    0.0830
Root MSE        =    5.5309

-----------------------------------------------
wage |      Coef.   Std. Err.      t
--------------+--------------------------------
tenure |   .1048823   .0412746     2.54
hours |   .0874067   .0222925     3.92
ttl_exp |   .2183548   .0515089     4.24
married |   1.029717    1.12407     0.92
marriedtenure |   -.110726   .0532406    -2.08
marriedhours |  -.0418236   .0261311    -1.60
marriedttl |   .0869538   .0652744     1.33
_cons |   1.208404   .9551692     1.27
-----------------------------------------------
------------------------------------------------
wage |   P>|t|     [95% Conf. Interval]
--------------+---------------------------------
tenure |   0.011     .0239415    .1858232
hours |   0.000     .0436904    .1311231
ttl_exp |   0.000     .1173441    .3193655
married |   0.360    -1.174622    3.234056
marriedtenure |   0.038    -.2151326   -.0063194
marriedhours |   0.110    -.0930675    .0094204
marriedttl |   0.183    -.0410515     .214959
_cons |   0.206    -.6647154    3.081522
------------------------------------------------

( 1)  marriedtenure = 0
( 2)  marriedhours = 0
( 3)  marriedttl = 0

F(  3,  2219) =    2.31
Prob > F =    0.0748

``````

2.2 利用 Factor Indicator 的便捷方法

Factor Indicator 的更多应用及详情请见于 `help fvvarlist`，以及连享会推文：

``````   global cx "tenure hours ttl_exp"
reg wage i.married##c.(\$cx)
testparm married married#c.(\$cx)
``````

``````
Source |       SS           df       MS
-------------+----------------------------------
Model |  6140.31754         7  877.188219
Residual |  67880.4931     2,219  30.5905782
-------------+----------------------------------
Total |  74020.8106     2,226  33.2528349

Number of obs   =     2,227
F(7, 2219)      =     28.68
Prob > F        =    0.0000
R-squared       =    0.0830
Root MSE        =    5.5309

----------------------------------------------------
wage |      Coef.   Std. Err.      t
------------------+---------------------------------
married |
married  |   1.029717    1.12407     0.92
tenure |   .1048823   .0412746     2.54
hours |   .0874067   .0222925     3.92
ttl_exp |   .2183548   .0515089     4.24
|
married#c.tenure |
married  |   -.110726   .0532406    -2.08
|
married#c.hours |
married  |  -.0418236   .0261311    -1.60
|
married#c.ttl_exp |
married  |   .0869538   .0652744     1.33
|
_cons |   1.208404   .9551692     1.27
----------------------------------------------------

---------------------------------------------------
wage |  P>|t|     [95% Conf. Interval]
------------------+--------------------------------
married |
married  |  0.360    -1.174622    3.234056
tenure |  0.011     .0239415    .1858232
hours |  0.000     .0436904    .1311231
ttl_exp |  0.000     .1173441    .3193655
|
married#c.tenure |
married  |  0.038    -.2151326   -.0063194
|
married#c.hours |
married  |  0.110    -.0930675    .0094204
|
married#c.ttl_exp |
married  |  0.183    -.0410515     .214959
|
_cons |  0.206    -.6647154    3.081522
---------------------------------------------------

( 1)  1.married#c.tenure = 0
( 2)  1.married#c.hours = 0
( 3)  1.married#c.ttl_exp = 0

F(  3,  2219) =    2.31
Prob > F =    0.0748

``````

``````test married married#c.tenture married#c.hours married#c.ttl_exp
``````

3. 总结

1. 利用 factor indicator 语法极大的方便了虚拟变量交乘项的生成
2. 在回归和检验中均可使用，注意 `test` 应替代为 `testparm`
3. 在自变量多的时候，该方法的便捷性更加明显
4. 可以利用 `global` 命令将其他需要交乘的变量统一放入一个全局暂元中，之后直接 `\$` 引用就好，极大地减少代码的书写量

相关课程

最新课程-直播课

• Note: 部分课程的资料，PPT 等可以前往 连享会-直播课 主页查看，下载。

关于我们

• Stata连享会 由中山大学连玉君老师团队创办，定期分享实证分析经验。
• 连享会-主页知乎专栏，400+ 推文，实证分析不再抓狂。直播间 有很多视频课程，可以随时观看。
• 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标，输入简要关键词，以便快速呈现历史推文，获取工具软件和数据下载。常见关键词：`课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法`

✏ 连享会-常见问题解答：
https://gitee.com/lianxh/Course/wikis

New！ `lianxh` 命令发布了：

`. ssc install lianxh`

`. help lianxh`