# Stata：产生唯一数据编码的三种方法

Stata 连享会   主页 || 视频 || 推文

## 1. 数据生成

``````clear
set obs 25000
``````

``````*随机生成
gen household_ID = ceil(runiform()*50)
gen city_ID = ceil(runiform()*20)
gen state_ID = ceil(runiform()*50)
``````

``````*随机生成
gen x1 = rnormal()
gen x2 = rnormal()
``````

``````. list household_ID city_ID state_ID x1 x2 in 1/10

+-------------------------------------------------------+
| househ~D   city_ID   state_ID          x1          x2 |
|-------------------------------------------------------|
1. |       18        10         28    2.025588   -.1922264 |
2. |       14        11          8    1.042631   -.0807038 |
3. |        7        17         21    .2977124    .6150526 |
4. |        2         4         28   -1.722132   -1.358765 |
5. |       44         8         36   -.7291995    .0929139 |
|-------------------------------------------------------|
6. |       18        12         18    .8618261    .6687715 |
7. |        4         4         29    -.239354    .6361541 |
8. |       17        14          2     .516549   -.6399707 |
9. |       28        15         19   -1.812016   -1.628398 |
10. |       44         2         23   -1.015124   -.7855705 |
+-------------------------------------------------------+
``````

## 2. 创建唯一代码 (Unique ID) 的两种方式

### 2.1 用 egen 命令创建 Unique ID

``````. egen ID = group(household_ID city_ID state_ID)
. list in 1/10

+---------------------------------------+
| househ~D   state_ID   city_ID      ID |
|---------------------------------------|
1. |       18         28        10    6890 |
2. |       14          8        11    5296 |
3. |        7         21        17    2724 |
4. |        2         28         4     469 |
5. |       44         36         8   17091 |
|---------------------------------------|
6. |       18         18        12    6925 |
7. |        4         29         4    1253 |
8. |       17          2        14    6553 |
9. |       28         19        15   10945 |
10. |       44         23         2   16966 |
+---------------------------------------+
``````

``````. sort household_ID city_ID state_ID \\排序
. list household_ID city_ID state_ID ID in 1/10

+------------------------------------+
| househ~D   city_ID   state_ID   ID |
|------------------------------------|
1. |        1         1          2    1 |
2. |        1         1          5    2 |
3. |        1         1          7    3 |
4. |        1         1         10    4 |
5. |        1         1         10    4 |
|------------------------------------|
6. |        1         1         11    5 |
7. |        1         1         15    6 |
8. |        1         1         15    6 |
9. |        1         1         20    7 |
10. |        1         1         21    8 |
+------------------------------------+
``````

### 2.2 创建字符型 Unique ID

``````. gen ID3 = "H" + string(household_ID,"%2.0f" ) + "C" + string(city_ID) + "S" + string(state_ID)
. list household_ID city_ID state_ID ID3 in 1/10

+-------------------------------------------+
| househ~D   city_ID   state_ID         ID3 |
|-------------------------------------------|
1. |       18        10         28   H18C10S28 |
2. |       14        11          8    H14C11S8 |
3. |        7        17         21    H7C17S21 |
4. |        2         4         28     H2C4S28 |
5. |       44         8         36    H44C8S36 |
|-------------------------------------------|
6. |       18        12         18   H18C12S18 |
7. |        4         4         29     H4C4S29 |
8. |       17        14          2    H17C14S2 |
9. |       28        15         19   H28C15S19 |
10. |       44         2         23    H44C2S23 |
+-------------------------------------------+
``````

### 3. 处理重复观测

``````. sum ID

Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ID |     25,000    9890.821    5709.989          1      19757

. duplicates report ID

Duplicates in terms of ID

--------------------------------------
copies | observations       surplus
----------+---------------------------
1 |        15298             0
2 |         7528          3764
3 |         1836          1224
4 |          312           234
5 |           20            16
6 |            6             5
--------------------------------------
``````

``````collapse (mean) mean_x1=x1 mean_x2=x2 (median) ///
med_x1=x1 med_x2=x2, by(ID)
``````

``````. list in 1/10

+----------------------------------------------------+
| ID     mean_x1     mean_x2      med_x1      med_x2 |
|----------------------------------------------------|
1. |  1    .1792767    .7344462    .1792767    .7344462 |
2. |  2    1.093224   -.1823799    1.093224   -.1823799 |
3. |  3   -.3094974    2.653506   -.3094974    2.653506 |
4. |  4    .4993488    .8985389    .4993488    .8985389 |
5. |  5   -.7429997   -.0464208   -.7429997   -.0464208 |
|----------------------------------------------------|
6. |  6    .4437077   -.5161228    .4437077   -.5161228 |
7. |  7   -.7664803     1.15589   -.7664803     1.15589 |
8. |  8      .59837    .1051743      .59837    .1051743 |
9. |  9    .9568613   -.7659643    .9568613   -.7659643 |
10. | 10   -.8682789   -1.468759   -.8682789   -1.468759 |
+----------------------------------------------------+
``````

``````. duplicates report ID

Duplicates in terms of ID
--------------------------------------
copies | observations       surplus
----------+---------------------------
1 |        19757             0
--------------------------------------
``````

## 相关课程

http://lianxh.duanshu.com

### 课程一览

Stata数据清洗 游万海 直播, 2 小时，已上线

Note: 部分课程的资料，PPT 等可以前往 连享会-直播课 主页查看，下载。

#### 关于我们

• Stata连享会 由中山大学连玉君老师团队创办，定期分享实证分析经验。直播间 有很多视频课程，可以随时观看。
• 连享会-主页知乎专栏，300+ 推文，实证分析不再抓狂。
• 公众号推文分类： 计量专题 | 分类推文 | 资源工具。推文分成 内生性 | 空间计量 | 时序面板 | 结果输出 | 交乘调节 五类，主流方法介绍一目了然：DID, RDD, IV, GMM, FE, Probit 等。
• 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标，输入简要关键词，以便快速呈现历史推文，获取工具软件和数据下载。常见关键词：`课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法`

✏ 连享会学习群-常见问题解答汇总：
https://gitee.com/arlionn/WD

Stata 连享会： 知乎 | 简书 | 码云 | CSDN

## 1. 数据生成

``````clear
set obs 25000
``````

``````*随机生成
gen household_ID = ceil(runiform()*50)
gen city_ID = ceil(runiform()*20)
gen state_ID = ceil(runiform()*50)
``````

``````*随机生成
gen x1 = rnormal()
gen x2 = rnormal()
``````

``````. list household_ID city_ID state_ID x1 x2 in 1/10

+-------------------------------------------------------+
| househ~D   city_ID   state_ID          x1          x2 |
|-------------------------------------------------------|
1. |       18        10         28    2.025588   -.1922264 |
2. |       14        11          8    1.042631   -.0807038 |
3. |        7        17         21    .2977124    .6150526 |
4. |        2         4         28   -1.722132   -1.358765 |
5. |       44         8         36   -.7291995    .0929139 |
|-------------------------------------------------------|
6. |       18        12         18    .8618261    .6687715 |
7. |        4         4         29    -.239354    .6361541 |
8. |       17        14          2     .516549   -.6399707 |
9. |       28        15         19   -1.812016   -1.628398 |
10. |       44         2         23   -1.015124   -.7855705 |
+-------------------------------------------------------+
``````

## 2. 创建唯一代码 (Unique ID) 的两种方式

### 2.1 用 egen 命令创建 Unique ID

``````. egen ID = group(household_ID city_ID state_ID)
. list in 1/10

+---------------------------------------+
| househ~D   state_ID   city_ID      ID |
|---------------------------------------|
1. |       18         28        10    6890 |
2. |       14          8        11    5296 |
3. |        7         21        17    2724 |
4. |        2         28         4     469 |
5. |       44         36         8   17091 |
|---------------------------------------|
6. |       18         18        12    6925 |
7. |        4         29         4    1253 |
8. |       17          2        14    6553 |
9. |       28         19        15   10945 |
10. |       44         23         2   16966 |
+---------------------------------------+
``````

``````. sort household_ID city_ID state_ID \\排序
. list household_ID city_ID state_ID ID in 1/10

+------------------------------------+
| househ~D   city_ID   state_ID   ID |
|------------------------------------|
1. |        1         1          2    1 |
2. |        1         1          5    2 |
3. |        1         1          7    3 |
4. |        1         1         10    4 |
5. |        1         1         10    4 |
|------------------------------------|
6. |        1         1         11    5 |
7. |        1         1         15    6 |
8. |        1         1         15    6 |
9. |        1         1         20    7 |
10. |        1         1         21    8 |
+------------------------------------+
``````

### 2.2 创建字符型 Unique ID

``````. gen ID3 = "H" + string(household_ID,"%2.0f" ) + "C" + string(city_ID) + "S" + string(state_ID)
. list household_ID city_ID state_ID ID3 in 1/10

+-------------------------------------------+
| househ~D   city_ID   state_ID         ID3 |
|-------------------------------------------|
1. |       18        10         28   H18C10S28 |
2. |       14        11          8    H14C11S8 |
3. |        7        17         21    H7C17S21 |
4. |        2         4         28     H2C4S28 |
5. |       44         8         36    H44C8S36 |
|-------------------------------------------|
6. |       18        12         18   H18C12S18 |
7. |        4         4         29     H4C4S29 |
8. |       17        14          2    H17C14S2 |
9. |       28        15         19   H28C15S19 |
10. |       44         2         23    H44C2S23 |
+-------------------------------------------+
``````

### 3. 处理重复观测

``````. sum ID

Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
ID |     25,000    9890.821    5709.989          1      19757

. duplicates report ID

Duplicates in terms of ID

--------------------------------------
copies | observations       surplus
----------+---------------------------
1 |        15298             0
2 |         7528          3764
3 |         1836          1224
4 |          312           234
5 |           20            16
6 |            6             5
--------------------------------------
``````

``````collapse (mean) mean_x1=x1 mean_x2=x2 (median) ///
med_x1=x1 med_x2=x2, by(ID)
``````

``````. list in 1/10

+----------------------------------------------------+
| ID     mean_x1     mean_x2      med_x1      med_x2 |
|----------------------------------------------------|
1. |  1    .1792767    .7344462    .1792767    .7344462 |
2. |  2    1.093224   -.1823799    1.093224   -.1823799 |
3. |  3   -.3094974    2.653506   -.3094974    2.653506 |
4. |  4    .4993488    .8985389    .4993488    .8985389 |
5. |  5   -.7429997   -.0464208   -.7429997   -.0464208 |
|----------------------------------------------------|
6. |  6    .4437077   -.5161228    .4437077   -.5161228 |
7. |  7   -.7664803     1.15589   -.7664803     1.15589 |
8. |  8      .59837    .1051743      .59837    .1051743 |
9. |  9    .9568613   -.7659643    .9568613   -.7659643 |
10. | 10   -.8682789   -1.468759   -.8682789   -1.468759 |
+----------------------------------------------------+
``````

``````. duplicates report ID

Duplicates in terms of ID
--------------------------------------
copies | observations       surplus
----------+---------------------------
1 |        19757             0
--------------------------------------
``````

## 相关课程

http://lianxh.duanshu.com

### 课程一览

Stata数据清洗 游万海 直播, 2 小时，已上线

Note: 部分课程的资料，PPT 等可以前往 连享会-直播课 主页查看，下载。

#### 关于我们

• Stata连享会 由中山大学连玉君老师团队创办，定期分享实证分析经验。直播间 有很多视频课程，可以随时观看。
• 连享会-主页知乎专栏，300+ 推文，实证分析不再抓狂。
• 公众号推文分类： 计量专题 | 分类推文 | 资源工具。推文分成 内生性 | 空间计量 | 时序面板 | 结果输出 | 交乘调节 五类，主流方法介绍一目了然：DID, RDD, IV, GMM, FE, Probit 等。
• 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标，输入简要关键词，以便快速呈现历史推文，获取工具软件和数据下载。常见关键词：`课程, 直播, 视频, 客服, 模型设定, 研究设计, stata, plus, 绘图, 编程, 面板, 论文重现, 可视化, RDD, DID, PSM, 合成控制法`

✏ 连享会学习群-常见问题解答汇总：
https://gitee.com/arlionn/WD