RDIM Terminology

Terminology

This section provides an alphabetical listing of some of the terminology used in managing research data and information along with the meaning and/or application of these terms.

Click on the letter to see the definitions starting with that letter.

A C D E F H I J L M O P R S T V W

Metadata

Quality documentation and metadata enhances the discoverability of your data and enables others to accurately interpret, validate, reuse, and cite it.

Documentation is a comprehensive term that refers to any supporting material or information that provides context, explanation, or guidance regarding your research data. It includes a variety of components, such as codebooks, README files and, importantly, metadata.

Metadata or ‘data about data’ uses structured, standardized information to document data.

Supporting documentation e.g., codebooks, README files, code and scripts:

Data-level documentation is critical for interpreting, validating and re-using your data.

Please follow these guidelines if you are archiving our publishing your data via Research Data JCU.

At a minimum, ensure you include variable codes, labels, descriptions and units with your data (embedded) and/or in their own data file
Create and maintain codebooks, data dictionaries and README.txt files as required during your project and ensure you archive them with your data at completion.
Code and scripts used to derive or analyse your data should also be retained and published with your data as appropriate.

You can read more about study-level vs. data-level documentation and how documentation is stored below.

Metadata standards, schemas, classifications, vocabularies and ontologies:

Metadata in the form of standards, schemas, classification codes, vocabularies or ontologies may be relevant for your research project, particularly if you are depositing your data in a discipline specific repository. See our Repository Lists webpage for more information.

While these terms can be confusing, they all provide a structured framework for organizing and describing data. This ensures consistency, interoperability, and enhanced discoverability across domains and systems.

You can read more about machine vs. human-readable metadata and how metadata is stored below.

Here are some examples:

The Darwin Core metadata standard is an extension of Dublin Core (used for general resource description) specifically designed for biodiversity data. It is used by the Global Biodiversity Information Facility (GBIF) to aggregate and disseminate biodiversity data from various sources, promoting collaboration and advancing global insights into biodiversity. Darwin Core and Dublin Core are standards that include metadata schemas.

The ANZLIC metadata guidelines are widely used to document and describe spatial data in Australia and New Zealand. This profile of AS/NZS ISO 19115:2011 Geographic Information – Metadata has been retired (since 2015) in favour of the officially endorsed metadata standard AS/NZS ISO 19115.1:2015 Metadata (including the 2018 Amendment No.1). See the ANZLIC webpage for more information.

The Australian and New Zealand Standard Classification of Occupations (ANZSCO) is widely used to ensure consistency and comparability when dealing with occupational data. This is an example of a classification system that uses coding.

Controlled vocabularies make it easier for researchers to find or analyse data or to aggregate it with other data. There are literally thousands of vocabularies available in the research domain. Research Vocabularies Australia can help you locate, access and reuse vocabularies for your research project.

Gene ontology (GO) is used in bioinformatics and genomics. This ontology provides a structured vocabulary and standardized annotations to enable systematic and comprehensive analysis of gene functions across different species and biological contexts.

Additional reading

Study-level and data-level documentation:

Project or study-level metadata is often included in Research Data Management Plans (RDMPs) and provides a high-level overview and context for the data e.g. the research project’s aims, subject descriptions (keywords, FoR codes etc.), personnel, data collection and analysis methods, information about (IP) rights, access and plans for handling sensitive data.

In Research Data JCU some of this study-level metadata will auto-fill the Data Records and Data Publications that you create from your RDMPs. While this information is important for context, these metadata records must describe the data and not just the project or a publication.

You also need to provide supporting documentation at the data-level. This ensures your data is not misinterpreted and is critical for validating, reproducing and reusing your data. Some examples (from the UK Data Archive) include:

variable codes, labels, descriptions and units
reasons for missing values
weighting and grossing variables created
code and scripts used to derive data after collection (simple derivations such as grouping by age levels can be explained in variable and value labels)

Storing metadata and documentation:

Research Data JCU includes many of the metadata fields you will need to comprehensively document your data.

Documentation can be stored with the data (embedded) and/or included in their own data file e.g. codebooks, README files, scripts as supporting documentation.

Embedded documentation can be as simple as a key in a MS Excel spreadsheet or more complex e.g. for software packages such as R and Python libraries that include facilities for data annotation. If possible, export these as plain text and include them with your supporting documentation.

Machine and human-readable metadata:

Including some machine-readable metadata elements improves automation, and makes it easier for tools and systems to index, search, and analyse datasets efficiently. This is particularly important in large-scale data applications and systems.

In Research Data JCU, we use machine-readable Digital Object Identifiers (DOIs) to identify your Data Publications and to link them with your other outputs, and machine-readable licences. DOIs and licences allow other researchers to discover your work, attribute it properly and understand the terms under which it can be used – via Research Data JCU, Research Data Australia and other services that harvest the metadata.

Human-readable metadata and descriptions provides context and insights into the background, methodology and nuances of a dataset, and improves its interpretability.

Combining machine and human-readable metadata enhances metadata quality, the integrity and utility of your datasets, and supports FAIR data.

Moral Rights

Moral rights are personal legal rights belonging to the creator of copyright works and cannot be transferred, assigned or sold. They ensure that the creators of works are correctly attributed and the works are not treated in a derogatory way and that the integrity of the work is upheld.

By assigning ownership of the copyright of a work to someone else the author transfers control over its future publication or reproduction to the new owner. But the author almost always retains moral rights to his/her work regardless of the copyright owner. This provides the creator of a work the right to be identified as the author and the right to take legal action against any change to the title of or derogatory treatment of the work itself.