RDIM Terminology

Terminology

This section provides an alphabetical listing of some of the terminology used in managing research data and information along with the meaning and/or application of these terms.

Click on the letter to see the definitions starting with that letter.

A C D E F H I J L M O P R S T V W

Metadata

Metadata is ‘data about data’ – i.e., it defines and describes the data. Good metadata is an intrinsic element of the FAIR principles as it ensures data is discoverable and that others can interpret/validate, re-use and cite it correctly. Without metadata, reuse and reproducibility are impossible. Unlike documentation, metadata should be machine-readable.

Metadata serves three main purposes:

  1. It explains the provenance of the data, or how, when and where the data was created and by whom. This is necessary for others to know where the data came from.
  2. It helps other users to understand (the context of) your data. It summarises basic information about the data, which facilitates its reuse.
  3. It increases the findability of the data e.g. because the metadata includes a unique persistent identifier like a DOI that is assigned to the dataset. It may also contain keywords that can be indexed by search engines.

Metadata will need to be created if the researcher plans to publish or share the data or to archive the data in a repository such as Research Data JCU.

The following is an example of the information in a generic metadata file. However, there may be metadata standards that are specific to your domain. Enriching your data with additional domain-specific metadata will make it more useful and findable for others.

Metadata field name Example value
Name  
Author  
Keywords  
Version  
Dataset DOI  
Reference  
Description  
Date  
Location  
License  
Correspondence

Metadata includes data-level documentation as well as study-level documentation - it should not just describe the project or a publication.

Study-level documentation
Study-level documentation for data is often included in Research Data Management Plans (RDMPs) and provides a high-level overview and context for the data. It is an important component of the metadata and is key to enabling secondary users to make informed use of shared data. Some systems (like Research Data JCU) integrate RDMPs and metadata collection so that researchers don't have to re-enter this information.
Data-level documentation

While it may be tempting to stop at the study-level, metadata also needs to include data-level documentation as this is critical for validating, reproducing and re-using data. It could include:

  • Names, labels and descriptions for variables
  • Definitions of codes and classification schemes
  • Definitions of specialised terminology or acronyms
  • Codes and reasons for missing values (refer to Data Wrangling)
  • Code and scripts used to derive data after collection (simple derivations such as grouping by age levels can be explained in variable and value labels)
  • Weighting and grossing variables created

Data-level documentation may also be embedded in the data itself.

The terms ‘data documentation’, ‘data provenance’ and ‘data lineage’ are often confused. Definitions vary, but they could be considered as a continuum, with data documentation at the broadest level. Provenance is concerned with questions of data origins, maintenance of identity through the data lifecycle, and how we account for data modification. This can be likened to the chain of custody in criminal investigations (previous owners have to be identified and held accountable for the processing and cleaning operations they have performed on the data). Technical data lineage relies on metadata that tracks data flows on the lowest level - tables, scripts, and statements, etc.

Storing Metadata

Metadata can be stored in local systems with the related data - or in data or metadata stores when it is complete. Research Data JCU is an example of an institutional metadata store and contains records (Data Records and Data Publications) for datasets generated by JCU researchers and HDR candidates.

Data Publications in Research Data JCU are harvested regularly and published by Research Data Australia. The Research Data JCU platform also provides secure storage for datasets which (unless restricted) are accessed directly via the catalogue or by negotiation with the data manager.

Data-level documentation/metadata such as workflows, detailed methodologies, variable descriptions, codes and units are often stored with the data (embedded) or included in their own data file (e.g., codebook, README text etc. as supporting documentation).

Embedded documentation can be as simple as a key in a MS Excel spreadsheet (an additional worksheet) or it may be more complex (e.g., for software packages that include facilities for data annotation as variable attributions, table relationships etc). If possible, export this as a plain text file and include it with your supporting documentation, as this facilitates FAIR data.

Moral Rights

Moral rights are personal legal rights belonging to the creator of copyright works and cannot be transferred, assigned or sold.  They ensure that the creators of works are correctly attributed and the works are not treated in a derogatory way and that the integrity of the work is upheld.

By assigning ownership of the copyright of a work to someone else the author transfers control over its future publication or reproduction to the new owner. But the author almost always retains moral rights to his/her work regardless of the copyright owner. This provides the creator of a work the right to be identified as the author and the right to take legal action against any change to the title of or derogatory treatment of the work itself.