Why are PO&D so important that they warrant an entire chapter?
Hit by a bus rule: Could another researcher pick up your project and finish it if you were hit by a bus?
Publication process may take years from the time a project is started – maintaining proper documentation allows a researcher to more easily re-familiarize themselves with a project.
Effective planning saves time and prevents errors.
Especially important during collaborative projects to avoid confusion over who is responsible for what.
Also especially important for long or complex research projects.
Makes working on multiple projects easier.
Start with general goals and publishing plans, this allows researchers to prioritize tasks so that the early stages of a project are not held up by data collection and/or data management.
Plan a timeline with target dates for key stages of the project. Goals may not be met, but can be used as useful benchmarks for assessing your plan. It also allows you to plan around special deadlines such as conference submission deadlines
A well thought out plan allows for a smooth division of labor. With no plan it is difficult to coordinate data management and use. A plan also allows for understandings surrounding authorship. A failed collaboration certainly affects the success of a project, but also potentially affects interpersonal relationships.
Select an enforcer for collaborative projects – someone specifically designated to focus on PO&D. For example, the enforcer may be in charge of backing up the data or organizing files.
Simple but informative variable names help avoid confusion.
Planning for missing data – does it matter if the missing data are due to attrition, refusal, or a skip pattern in the survey? Do you need to account for these different forms of missing data with separate codes?
Consider the software being used to analyze the data. The type of software being used affects data formats, naming conventions, and data structures.
Successful organization determines what goes where, what to name a file, how you find a file.
Start early! Staying organized early in a project allows for easier organization towards the end of a project.
Simple but not too simple. Large and small projects require different levels of complexity in organization. Not every project is going to be organized in the same way.
Stay consistent. Using the same naming scheme for all projects reduces the amount of time you must spend thinking about organization.
Use written documentation of your organization procedures. This prevents you from changing conventions mid-project. This is especially important during collaborative projects.
Using dashes (-) or underscores (_) in naming conventions allow researchers to avoid using quotation marks in their program commands. They also facilitate work on projects across operating systems (PC vs Mac).
Labeling files with a short mnemonic unique to each project allows for a researcher to quickly identify or locate files associated with particular projects.
“Work” and “Posted” directories are useful to keep straight which aspects of a project are finished and which are still in progress.
Using a dash (-) at the beginning of a directory name pushes it to the top of your directory list.
Mailbox directories can be set up on collaborative projects. For example, a directory may be named Researcher1toResearcher2 or Researcher2toResearcher1.
Private directories can be set up for files that aren’t intended to be shared with the entire research team.
“– to clean” directory can be used for files that you are unsure of where to put. These files stay in this folder until they can be transferred to their proper place.
“– hold then delete” directory can be used as a sort of trash bin. It can be used to hold on to old files until they it is clear they can be deleted.
Use directory names to highlight what the directory contains.
“- history” directory can be used to keep track of critical information about files in the project.
Spreadsheets can be useful tools for organizing a directory. See Long (2009:29)
Long’s Law of Documentation: It is always faster to document it today than tomorrow.
Documentation reminds a researcher of decisions made, work completed, and plans for future work.
Without documentation, replication is nearly impossible.
Document data sources so that you are aware of which release of a dataset you are using.
Document data decisions such as how variables were created and cases selected. Why?
Document that type of analysis used, the order it was used in, and why it was used. Also account for other analyses that you may have run but decided not to use.
Document the type of software used.
Keep a record of where the results are stored
A Research log can be used to chronicle what, when, and how decisions were made. This can be handwritten or constructed using a word processor. See research log template Long (2009:41)
a. A good research log keeps work on track
b. Helps deal with interruptions to work
c. Facilitates replication
Include detailed comments in your do files using (*)
Internally labeling documents with the author’s name, the date it was created, and the name of the document allows for better tracking of which document is the most recent. This information can be added at the end of a document.
Include full dates and names so that there is not confusion down the road.
Code books are also important for documentation of variables. Good codebooks typically include:
a. Variable name and question number
b. Text of the original question and variable label used
c. If data were collected using a survey any branching information should be included
d. Descriptive statistics including value labels for categorical variables
e. Descriptions of how missing data can occur as well as codes for each type of missing data
f. Information on recoding or imputation. Include information about how missing data were handled of a variable was constructed from other variables.
g. An appendix that contains any abbreviations or conventions used.
Manage datasets with a dataset registry. See Long (2009:45)
a. Dataset registries track your datasets as well as the dofiles used to create them. A dataset registry allows an individual to more easily troubleshoot any potential data problems.
Examples and templates cited from Long (2009) are also available in file format.