# RDD：断点回归的非参数估计及Stata实现

Stata 连享会   主页 || 视频 || 推文

## 1. 命令安装与方法介绍

``````net install rdrobust, from(http://www-personal.umich.edu/~cattaneo/rdrobust)
``````

RD 可以用来识别自然实验或结构性政策变化附近的局部处理效应。

## 2. 模拟生成非线性相关数据

``````clear
set obs 10000

gen income=3^((runiform()-0.75)*4)
label var income "Reported Income"

sum income
Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
income |     10,000    .6789349    .7606786   .0370671   2.999232

gen perfo=ln(income)+sin((income-r(min))/r(max)*4*_pi)/3+3
label var perfo "Performance Index - Base"

scatter perfo income
``````

``````gen perf1=perfo+rnormal()*0.5
label var perf1 "Performance Index - with noise"

scatter perf1 income
``````

``````ssc install rcspline

rcspline perf1 income, nknots(7) showknots title(Cubic Spline)

``````

``````gen grant=income<0.5
sum grant

Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
grant |     10,000       .5921     .491469          0          1

*样本中大约有59%低收入学生是具备获得政府奖金资格的
*现在加入政府奖金对学生表现的正向效应
*首先生成以政府奖金资格确认收入线为中心的收入变量
gen income_center=income-0.5
gen perf2=perf1+0.5*grant-0.1*income_center*grant
*这样政府奖金对低收入学生将更加有效
label var perf2 "Observed Performance"
``````

## 3. 进行非参数估计

``````reg perf2 income grant

Source |       SS           df       MS      Number of obs   =    10,000
-------------+----------------------------------   F(2, 9997)      =   5734.77
Model |   7041.8845         2  3520.94225   Prob > F        =    0.0000
Residual |  6137.80314     9,997  .613964504   R-squared       =    0.5343
Total |  13179.6876     9,999  1.31810057   Root MSE        =    .78356

------------------------------------------------------------------------------
perf2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
income |   .9003367   .0168801    53.34   0.000     .8672482    .9334251
grant |  -.3767682   .0261265   -14.42   0.000    -.4279814    -.325555
_cons |   1.924569   .0267009    72.08   0.000      1.87223    1.976909
------------------------------------------------------------------------------
``````

``````*-默认断点在 0 点处，因此我们使用中心化后的变量 income_centered
rdrobust perf2 income_center

Sharp RD estimates using local polynomial regression.

Cutoff c = 0 | Left of c  Right of c            Number of obs =      10000
-------------------+----------------------            BW type       =      mserd
Number of obs |      5921        4079            Kernel        = Triangular
Eff. Number of obs |       683         530            VCE method    =         NN
Order est. (p) |         1           1
Order bias (q) |         2           2
BW est. (h) |     0.129       0.129
BW bias (b) |     0.197       0.197
rho (h/b) |     0.652       0.652

Outcome: perf2. Running variable: income_center.
--------------------------------------------------------------------------------
Method |   Coef.    Std. Err.    z     P>|z|    [95% Conf. Interval]
-------------------+------------------------------------------------------------
Conventional | -.48486     .06467   -7.4971  0.000   -.611614     -.358102
Robust |     -          -     -6.1641  0.000   -.633493     -.327828
--------------------------------------------------------------------------------

``````

``````gen nincome_center=income_center*(-1)
rdrobust perf2 nincome_center

Sharp RD estimates using local polynomial regression.

Cutoff c = 0 | Left of c  Right of c            Number of obs =      10000
-------------------+----------------------            BW type       =      mserd
Number of obs |      4079        5921            Kernel        = Triangular
Eff. Number of obs |       530         683            VCE method    =         NN
Order est. (p) |         1           1
Order bias (q) |         2           2
BW est. (h) |     0.129       0.129
BW bias (b) |     0.197       0.197
rho (h/b) |     0.652       0.652

Outcome: perf2. Running variable: nincome_center.
--------------------------------------------------------------------------------
Method |   Coef.    Std. Err.    z     P>|z|    [95% Conf. Interval]
-------------------+------------------------------------------------------------
Conventional |  .48486     .06467   7.4971   0.000    .358102      .611614
Robust |     -          -     6.1641   0.000    .327828      .633493
--------------------------------------------------------------------------------
``````

``````rd perf2 nincome_center

Two variables specified; treatment is
assumed to jump from zero to one at Z=0.

Assignment variable Z is nincome_center
Treatment variable X_T unspecified
Outcome variable y is perf2

Estimating for bandwidth .1832282339354582
Estimating for bandwidth .0916141169677291
Estimating for bandwidth .3664564678709164
------------------------------------------------------------------------------
perf2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwald |   .4860547   .0539727     9.01   0.000     .3802703    .5918392
lwald50 |   .4929885   .0744274     6.62   0.000     .3471136    .6388635
lwald200 |   .5737225   .0377565    15.20   0.000     .4997212    .6477238
------------------------------------------------------------------------------
``````

`rd` 命令的好处在于它可以同时汇报出不同带宽下的估计结果。此处，带宽 (100 50 200) 分别代表最小化 MSE (mean squared error) 的最优带宽 (100%) 及其一半与两倍的带宽 (50%, 200%)。

``````gen effect_est = .
label var effect_est "Estimated Effect"

gen band_scale = .
label var band_scale "Bandwidth as a Scale Factor of Bandwidth that Minimizes MSE"

forv i = 1/16 {
rd perf2 nincome_center, mbw(100 `=`i'*25')
if `i' ~= 4 replace effect_est = _b[lwald`=`i'*25'] if _n==`i'
if `i' == 4 replace effect_est = _b[lwald] if _n==`i'
replace band_scale = `=`i'*25'     if _n==`i'
}

gen true_effect = .5
label var true_effect "True effect"

two (scatter effect_est band_scale) (line true_effect band_scale)
``````

## 参考资料

2. Calonico S, Cattaneo M D, Farrell M H, et al. rdrobust: Software for regression-discontinuity designs[J]. The Stata Journal, 2017, 17(2): 372-404.PDF

## 相关课程

http://lianxh.duanshu.com

### 课程一览

Stata数据清洗 游万海 直播, 2 小时，已上线

Note: 部分课程的资料，PPT 等可以前往 连享会-直播课 主页查看，下载。

#### 关于我们

• Stata连享会 由中山大学连玉君老师团队创办，定期分享实证分析经验。直播间 有很多视频课程，可以随时观看。
• 连享会-主页知乎专栏，300+ 推文，实证分析不再抓狂。
• 公众号推文分类： 计量专题 | 分类推文 | 资源工具。推文分成 内生性 | 空间计量 | 时序面板 | 结果输出 | 交乘调节 五类，主流方法介绍一目了然：DID, RDD, IV, GMM, FE, Probit 等。
• 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标，输入简要关键词，以便快速呈现历史推文，获取工具软件和数据下载。常见关键词：`课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法`

✏ 连享会学习群-常见问题解答汇总：
https://gitee.com/arlionn/WD