如何处理时间序列中的日期间隔-(with-gaps)-问题?

发布时间:2020-02-04 阅读 91

Author: Ashish Rajbhandari, Senior Econometrician[1]
Title: Handling gaps in time series using business calendars
Source: Stata blogs
转载+部分翻译:Stata连享会

Stata连享会   主页 || 视频 || 推文

连享会 - Stata 暑期班

线上直播 9 天:2020.7.28-8.7
主讲嘉宾:连玉君 (中山大学) | 江艇 (中国人民大学)
课程主页https://gitee.com/arlionn/PX | 微信版

编者按:

  • 在分析时间序列资料时(如股票收益数据),由于在周末或重要节日里休市,导致日期数据往往是不连续的。若使用 Stata 默认的日期格式,会导致我们无法连续地计算收益率。为此,我们应该做些适当的调整,而不是把这些差距看作是缺失的值。这这篇推文中,作者使用 Stata 的商业日历举例说明了处理不规则间隔的日期一个简便方法。
  • 鉴于原文作者表述清晰,提供了完整的 Stata 数据范例和命令,我们没有进一步翻译全文。

目录



Introduction

Time-series data, such as financial data, often have known gaps because there are no observations on days such as weekends or holidays. Using regular Stata datetime formats with time-series data that have gaps can result in misleading analysis. Rather than treating these gaps as missing values, we should adjust our calculations appropriately. I illustrate a convenient way to work with irregularly spaced dates by using Stata's business calendars.

In nasdaq.dta, I have daily data on the NASDAQ index from February 5, 1971 to March 23, 2015 that I downloaded from the St. Louis Federal Reserve Economic Database (FRED).

  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
. use http://www.stata.com/data/nasdaq
. describe
Contains data from http://www.stata.com/data/nasdaq.dta obs: 11,132 vars: 2 29 Jan 2016 16:21 size: 155,848 ------------------------------------------------------------------------------- storage display valuevariable name type format label variable label-------------------------------------------------------------------------------date str10 %10s Daily dateindex float %9.0g NASDAQ Composite Index (1971=100)-------------------------------------------------------------------------------Sorted by:

date is the time variable in our data, which is a string format ordered as year, month, and day. I use the date() function to convert the string daily date to a Stata numeric date and store the values in mydate. To find out more about converting string dates to numeric, you can read A tour of datetime in Stata.

  • ounter(line
  • ounter(line
. generate mydate = date(date,"YMD"). format %td mydate

I tsset these data with mydate as the time variable and then list the first five observations, along with the first lag of index.

  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
. tsset mydate        time variable:  mydate, 05feb1971 to 23mar2015, but with gaps                delta:  1 day
. list date mydate index l.index in 1/5
+------------------------------------------+ | L.| | date mydate index index | |------------------------------------------| 1. | 1971-02-05 05feb1971 100 . | 2. | 1971-02-08 08feb1971 100.84 . | 3. | 1971-02-09 09feb1971 100.76 100.84 | 4. | 1971-02-10 10feb1971 100.69 100.76 | 5. | 1971-02-11 11feb1971 101.45 100.69 | +------------------------------------------+

The first observation on l.index is missing; I expect this because there are no observations prior to the first observation on index. However, the second observation on l.index is also missing. As you may have already noticed, the dates are irregularly spaced in my dataset—the first observation corresponds to a Friday and the second observation to a Monday.

I get missing data in this case because mydate is a regular date, and tsset–ing by a regular date will treat all weekends and other holidays as if they are missing in the dataset instead of ignoring them in calculations. To avoid the problem of gaps inherent in business data, I can create a business calendar. Business calendars specify which dates are omitted. For daily financial data, a business calendar specifies the weekends and holidays for which the markets were closed.


Creating business calendars

Business calendars are defined in files named calname**.stbcal**. You can create your own calendars, use the ones provided by StataCorp, or obtain them directly from other users or via the SSC. Calendars can also be created automatically from the current dataset using the bcal create command.

  • Every stbcal-file requires you to specify the following four things:

    • the version of Stata being used
    • the range of the calendar
    • the center date of the calendar
    • the dates to be omitted

I begin by creating nasdaq.stbcal, which will omit Saturdays and Sundays of every month. I do this using the Do-file editor, but you can use any text editor.

  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
version 14.1purpose "Converting daily financial data into business calendar dates"dateformat dmyrange 05feb1971 23mar2015centerdate 05feb1971omit dayofweek (Sa Su)
  • The first line specifies the current version of Stata I am using.
  • The second line is optional, but the text typed there will display if I type bcal describe nasdaq and is good for record keeping when I have multiple calenders.
  • Line 3 specifies the display date format and is also optional.
  • Line 4 specifies the range of dates in the dataset.
  • Line 5 specifies the center of the date to be 05feb1971.

I picked the first date in the sample, but I could have picked any date in the range specified for the business calendar. centerdate does not mean choosing a date that is in fact the center of the sample. For example, Stata's default %td calendar uses 01jan1960 as its center.

The last statement specifies to omit weekends of every month. Later, I will show several variations of the omit command to omit other holidays. Once I have a business calendar, I can use this to convert regular dates to business dates, share this file with colleagues, and also make further changes to my calendar.


Using a business calendar

  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
. bcal load nasdaqloading ./nasdaq.stbcal ...
1. version 14.1 2. purpose "Converting daily financial data into business calendar dates" 3. dateformat dmy 4. range 05feb1971 23mar2015 5. centerdate 05feb1971 6. omit dayofweek (Sa Su)
(calendar loaded successfully)
. generate bcaldate = bofd("nasdaq",mydate)
. assert !missing(bcaldate) if !missing(mydate)

To create business dates using bofd(), I specified two arguments: the name of the business calendar and the name of the variable containing regular dates. The assert statement verifies that all dates recorded in mydate appear in the business calendar. This is a way of checking that I created my calendar for the complete date range—the bofd() function returns a missing value when mydate does not appear on the specified calendar.

Business dates have a specific display format, %tbcalname, which in my case is %tbnasdaq. In order to display business dates in a Stata date format I will apply this format to bcaldate just as I would for a regular date.

  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
. format %tbnasdaq bcaldate
. list in 1/5
+---------------------------------------------+ | date index mydate bcaldate | |---------------------------------------------| 1. | 1971-02-05 100 05feb1971 05feb1971 | 2. | 1971-02-08 100.84 08feb1971 08feb1971 | 3. | 1971-02-09 100.76 09feb1971 09feb1971 | 4. | 1971-02-10 100.69 10feb1971 10feb1971 | 5. | 1971-02-11 101.45 11feb1971 11feb1971 | +---------------------------------------------+

Although mydate and bcaldate look similar, they have different encodings. Now, I can tsset on the business date bcaldate and list the first five observations with the lag of index recalculated.

  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
. tsset bcaldate        time variable:  bcaldate, 05feb1971 to 23mar2015, but with gaps                delta:  1 day
. list bcaldate index l.index in 1/5
+-----------------------------+ | L.| | bcaldate index index | |-----------------------------| 1. | 05feb1971 100 . | 2. | 08feb1971 100.84 100 | 3. | 09feb1971 100.76 100.84 | 4. | 10feb1971 100.69 100.76 | 5. | 11feb1971 101.45 100.69 | +-----------------------------+

As expected, the issue of gaps due to weekends is now resolved. Because I have a calendar that excludes Saturdays and Sundays, bcaldate skipped the weekend between 05feb1971 and 08feb1971 when calculating the lagged index value and will do the same for any subsequent weekends in the data.


Excluding specific dates

So far I have not excluded gaps in the data due to other major holidays, such as Thanksgiving and Christmas. Stata has several variations on the omit command that let you exclude specific dates. For example, I use the omit command to omit the Thanksgiving holiday (the fourth Thursday of November in the U.S.) by adding the following statement in my business calendar.

  • ounter(line
omit dowinmonth +4 Th of Nov

dowinmonth stands for day of week in month and +4 Th of Nov refers to the fourth Thursday of November. This rule is applied to every year in the data.

Another major holiday is Christmas, with the NASDAQ closed on the 25th of December every year. I can omit this holiday in the calendar as omit date 25dec*

The * in the statement above indicates that December 25 should be omitted for every year in my nasdaq calendar. This rule is misleading since the 25th may be on a weekend, in which case the holidays are on the preceeding Friday or following Monday. To capture these cases, I add the following statements:

  • ounter(line
  • ounter(line
omit date 25dec* and (-1) if dow(Sa)omit date 25dec* and (+1) if dow(Su)

The first statement omits December 24 if Christmas is on a Saturday, and the second statement omits December 26 if Christmas is on a Sunday.


Encodings

I mentioned earlier that the encodings of regular date mydate and business date bcaldate are different. To see the encodings of my date variables, I apply the numerical format and list the first five observations.

  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
  • ounter(line
. format %8.0g mydate bcaldate. list in 1/5     +-----------------------------------------+     |       date    index   mydate   bcaldate |     |-----------------------------------------|  1. | 1971-02-05      100     4053          0 |  2. | 1971-02-08   100.84     4056          1 |  3. | 1971-02-09   100.76     4057          2 |  4. | 1971-02-10   100.69     4058          3 |  5. | 1971-02-11   101.45     4059          4 |     +-----------------------------------------+

The variable bcaldate starts with 0 because this was the centerdate in my calendar nasdaq.stbcal. The business date encoding is consecutive without gaps, which is why using lags or any time-series operators will yield correct values.


Summary

Using regular dates with time-series data instead of business dates may be misleading in case there are gaps in the data. In this post, I showed a convenient way to work with business dates by creating a business calendar. Once I loaded a calendar file into Stata, I created business dates using the bofd() function. I also showed some variations of the omit command used in business calendars to accommodate specific gaps due to different holidays.

连享会 - 文本分析与爬虫 - 专题视频

主讲嘉宾:司继春 || 游万海

连享会-文本分析与爬虫-专题视频教程
连享会-文本分析与爬虫-专题视频教程

相关课程

连享会-直播课 上线了!
http://lianxh.duanshu.com

免费公开课:


课程一览

支持回看,所有课程可以随时购买观看。

专题 嘉宾 直播/回看视频
Stata暑期班 连玉君
江艇
线上直播 9 天
2020.7.28-8.7
效率分析-专题 连玉君
鲁晓东
张 宁
视频-TFP-SFA-DEA
已上线,3天
文本分析/爬虫 游万海
司继春
视频-文本分析与爬虫
已上线,4天
空间计量系列 范巧 空间全局模型, 空间权重矩阵
空间动态面板, 空间DID
研究设计 连玉君 我的特斯拉-实证研究设计-幻灯片-
面板模型 连玉君 动态面板模型-幻灯片-
直击面板数据模型 [免费公开课,2小时]

Note: 部分课程的资料,PPT 等可以前往 连享会-直播课 主页查看,下载。


关于我们

  • Stata连享会 由中山大学连玉君老师团队创办,定期分享实证分析经验。直播间 有很多视频课程,可以随时观看。
  • 连享会-主页知乎专栏,300+ 推文,实证分析不再抓狂。
  • 公众号推文分类: 计量专题 | 分类推文 | 资源工具。推文分成 内生性 | 空间计量 | 时序面板 | 结果输出 | 交乘调节 五类,主流方法介绍一目了然:DID, RDD, IV, GMM, FE, Probit 等。
  • 公众号关键词搜索/回复 功能已经上线。大家可以在公众号左下角点击键盘图标,输入简要关键词,以便快速呈现历史推文,获取工具软件和数据下载。常见关键词:
    • 课程, 直播, 视频, 客服, 模型设定, 研究设计, 暑期班
    • stata, plus,Profile, 手册, SJ, 外部命令, profile, mata, 绘图, 编程, 数据, 可视化
    • DID,RDD, PSM,IV,DID, DDD, 合成控制法,内生性, 事件研究, 交乘, 平方项, 缺失值, 离群值, 缩尾, R2, 乱码, 结果
    • Probit, Logit, tobit, MLE, GMM, DEA, Bootstrap, bs, MC, TFP, 面板, 直击面板数据, 动态面板, VAR, 生存分析, 分位数
    • 空间, 空间计量, 连老师, 直播, 爬虫, 文本, 正则, python
    • Markdown, Markdown幻灯片, marp, 工具, 软件, Sai2, gInk, Annotator, 手写批注, 盈余管理, 特斯拉, 甲壳虫, 论文重现, 易懂教程, 码云, 教程, 知乎

连享会主页  lianxh.cn
连享会主页 lianxh.cn

连享会小程序:扫一扫,看推文,看视频……


扫码加入连享会微信群,提问交流更方便

✏ 连享会学习群-常见问题解答汇总:
https://gitee.com/arlionn/WD

参考资料

[1]

Posts by Ashish Rajbhandari, Senior Econometrician: http://blog.stata.com/author/arajbhandari/