Data Package

A Data Package enables your research to be sustainable, facilitates reuse and allows replication of the study by other researchers who should be able to replicate your study independently and solely based on this information.

Your data package should be concise yet as complete as possible and include:

  • A README or instruction file which lists the files inside the package and explains their relation and includes a step-by-step instruction on how to use the files to replicate the study.
  • Raw data files. If your study is based on a portion of the original dataset, include only the necessary data. Make sure to include de-identified data in your data package and omit & personal and sensitive data.
  • Processed data files. In many cases, the raw data will be transformed to a processed format that is suitable for further analysis.
  • A data appendix/codebook which provides information about every variable in your dataset (e.g. variable name, value labels, the type and format of the variable).
  • Command files/syntax which includes code scripts that were used to transform the raw data into processed data and code scripts which were used to analyse the data and produce the results. The code should be accompanied by (inline) comments or other instructions needed for others to replicate your study. You should not include information or code in the package that you are not allowed to share (e.g. licensed software).
  • Protocols which were used during the study, for instance about the performed experiments.
  • Lab journals.
  • A reference to any publication which is based on the data.
  • Other metadata e.g., the parameters used in your study.