RDIM Terminology

Terminology

This section provides an alphabetical listing of some of the terminology used in managing research data and information along with the meaning and/or application of these terms.

Click on the letter to see the definitions starting with that letter.

A C D E F H I J L M O P R S T V W

Data

Refer to Research Data.

Data Creator

Data Creators capture or create the data for the research project.

Due to current system limitations, internal and external data creators (including Other Data Creators) are dealt with in two ways:

For data creators internal to JCU:
- A Data Creator is any JCU staff member or HDR candidate that has involvement in the creation of research data.
For data creators external to JCU:
- An external data creator is referred to as a Collaborator.

Data Custodian

The Data Custodian is the person who has overall custody of and is entrusted with the research (data and information) assets throughout its lifecycle. Their overall responsibilities include:

Ensuring good records management practices are followed.
Managing and maintaining data and information, including metadata, to ensure that discovery mechanisms function.
Managing escalated risks related to research data and information through the University’s risk management processes.
Monitoring and responding to performance measures.
Prior to departure from JCU, the Data Custodian should ensure these responsibilities are assigned to an appropriate delegate to manage.

Key roles including Data Creator, Data Manager and Data Custodian are outlined in Appendix 2 — Custodianship Model for Research Data and Information, which forms part of the Management of Data and information in Research Procedure.

Data Manager

The Data Manager provides advice and or supervision to the researcher(s) throughout the research project lifecycle and reviews (spot checks) the veracity of the project’s data and information. Post project, the Data Manager is responsible for safeguarding the research (data and information) assets from unauthorised access and misuse, and in some cases, for archiving (record keeping) and disposal.

Because responsibilities for data and information extend past the life of the research project (e.g. after the HDR candidate has graduated), the role of the Data Manager must be assumed by a JCU staff member. As such, the Data Manager is either the:

Lead Investigator; or
For HDR projects ONLY: Primary Advisor.

On departure of a Lead Investigator or Primary Advisor from JCU, the role will transfer to the:

For Colleges - Associate Dean of the Researcher (ADR) or appropriate delegate as determined by the Data Custodian.
For Institutes and Centres – Data Custodian can delegate to an appropriate Manager.

Data Package

A Data Package enables your research to be sustainable, facilitates reuse and allows replication of the study by other researchers who should be able to replicate your study independently and solely based on this information.

Your data package should be concise yet as complete as possible and include:

A README or instruction file which lists the files inside the package and explains their relation and includes a step-by-step instruction on how to use the files to replicate the study.
Raw data files. If your study is based on a portion of the original dataset, include only the necessary data. Make sure to include de-identified data in your data package and omit & personal and sensitive data.
Processed data files. In many cases, the raw data will be transformed to a processed format that is suitable for further analysis.
A data appendix/codebook which provides information about every variable in your dataset (e.g. variable name, value labels, the type and format of the variable).
Command files/syntax which includes code scripts that were used to transform the raw data into processed data and code scripts which were used to analyse the data and produce the results. The code should be accompanied by (inline) comments or other instructions needed for others to replicate your study. You should not include information or code in the package that you are not allowed to share (e.g. licensed software).
Protocols which were used during the study, for instance about the performed experiments.
Lab journals.
A reference to any publication which is based on the data.
Other metadata e.g., the parameters used in your study.

Data Papers

Data journals are publications whose primary purpose is to expose datasets; some more general journals also have sections dedicated to the publication of Data Papers. The publication of a Data Paper may be considered best practice for researchers whose primary output is data, or for whom the development of new, data-driven technologies is of particular importance. The publication of Data Papers includes an element of peer review, maximises opportunities for reuse and attracts academic accreditation for data scientists as well as front-line researchers (such papers may also be eligible for assessment as eligible research outputs under the Australian Research Council's Excellence in Research Australia (ERA) guidelines).

Data Publication

A data publication:

Is recorded in Research Data JCU.
Is a public metadata record of the data and information generated as part of the research project.
Includes detailed descriptions of the data (confidential or sensitive information about the data should not be included in the metadata record).
May involve issuing a Digital Object Identifier (DOI) (refer below) ensuring that your data can be correctly cited, attributed, interpreted and reused.
Establishes the conditions for access and reuse including the use of data licences (refer below);
- The data associated with a Data Publication may be Open Access or shared via Conditional Access(i.e. access to data is negotiated via the Data Manager);
- Metadata only records may also be created e.g. when access to data is restricted or data is held in another trusted repository. DOIs will not be assigned for these datasets.
Can be embargoed to allow for your research outputs to be published first.

You must have a Data Record in order to create one or more related Data Publications.

Data Publications are harvested by the Australian Research Data Commons, Google Datasearch, JCU’s Research Portfolio and other systems. This increases the visibility of your work and could attract future collaboration and funding opportunities.

Data Publications can also be imported (via Research Data Australia) into your ORCID profile. See this short video for instructions on importing Open Researcher and Contributor Identifier and Data. Refer to the Researcher Profiles, Identifiers and Engagement LibGuide for more information on ORCID.

You can make some of your data available (e.g. supporting documentation such as interview guides or codebooks - or a limited number of files) while restricting access to other files if you wish. Simply select the files you would like to include instead of selecting the 'Publish metadata only (no data)' option.

NOTE: Once submitted, Data Publications can only be edited by the reviewer (Data Librarian) and are read-only. The reviewer can update the Data Publications e.g., adding details for related publications that were pending when the Data Publication was deposited. However, significant changes cannot be made once the Data Publication is live and a DOI has been issued.

Data Record

A Data Record:

Is documented in Research Data JCU.
Is a non-public metadata record of the data and information associated with your research project.
Includes data attachments or the storage location of completed (not active) data.
Should also include any documentation necessary to understand or reproduce the research (such as survey questions, data dictionary, codebooks and R scripts).
- Research Data JCU can be used to store data up to 100 MB. However, if data is greater than 100MB or is SENSITIVE, contact researchdata@jcu.edu.au to organise appropriate storage options.
Can apply to a large funded research project, a smaller or less formal project, a thesis, data chapters in a thesis or a dataset that will later be made available with a paper.

Completing a Data Record will:

Enable you to find data even after long periods of time have elapsed.
Ensure the integrity of your research methods and findings.
Satisfies requirements under the Australian Code for the Responsible Conduct of Research.

Data Repositories

The two key data repositories are:

Research Data JCU (via Research Data Australia) JCU datasets registered in Research Data Australia (most recent first)
Research Data Australia (RDA): Australia's research data commons helps you find, access and reuse data for research from 100 Australian research organisations, government agencies and cultural institutions. RDA harvests data descriptions and links to data held with their data publishing partners. JCU has over 2,500 datasets in RDA

Data repositories - whether institutional, national, international, generalist, or discipline-specific - exist to support and facilitate long-term access to research data.

Research funders or journals may mandate data deposition in a particular repository. For example, ’Most journals require DNA and amino acid sequences that are cited in articles be submitted to a public sequence repository (DDBJ/ENA/Genbank - INSDC) as part of the publication process.’ https://www.ncbi.nlm.nih.gov/genbank/submit/).

Many journals integrate data deposition in a generalist repository (e.g. Dryad) with the submission of manuscripts of related research publication.

For a list of other significant data repositories, including major generalist and subject-specific repositories, refer to Step 3: Archive page.

Data Retention

Preserving the data and information after your research project has been finalised is critical to:

Prevent data loss;
Enable long-term access, discovery and reuse; and
Ensure researchers and institutions can defend their research outcomes if they are challenged.

Preservation activities need to be planned and should take into account file formats and data quality, data ownership (refer to Copyright, Intellectual Property and Moral Rights), retention periods, preferred data repositories and ways to share data safely.

Retention rules are defined by the research funding body or the university. Key documents for JCU researchers include the guide 'Management of Data and Information in Research' which supports the 2018 Code and the University Sector Retention and Disposal Schedule for Queensland universities.

In general, the minimum period for retention of data is five years from the end of the year of publication of the last refereed publication or other form of public release to an audience outside of the University that is based on the data. However, in any particular case the period for which data should be retained should be determined by the specific type of research e.g. for areas such as gene therapy, research data must be retained permanently.

For more information refer to the retention rules for specific data types.

Data Storage - Active Data or Working Data

Appropriate data storage is a critical aspect of good research data management.

Many factors can lead to data loss or misuse with devastating consequences for your research and research career. Safeguarding against these should be a priority.

Researchers may need different storage and collaboration solutions at different stages of the Research (Data and Information) Asset Lifecycle. The options, also listed under ‘Active Storage and Collaboration Options’ are suitable for storing active (working) data, collaborating with other researchers, and/or creating backups.

Safe data is data stored in a “safe” data storage system:

The system operates with a low probability of failure
It is managed by JCU staff or an approved third-party provider
It is designed for mid to long-term data storage.

It is important to note that while you may consider the data on the system you are currently using for analysis as your primary or main data, these systems do not necessarily qualify as “safe” storage.

JCU approved storage options for safe active data include:

JCU Microsoft OneDrive | up to 5 TB – staff and students
JCU Microsoft Teams | up to 1 TB per team - available to JCU staff on request
JCU QCIF Research Data Storage (QRISCloud) | greater than 50 GB, up to many TB - access via JCU QRISCloud Cache or direct
Storage as dictated by Research Partner and/or Funding Agency
JCU research file shares - limited availability, based strictly on research need i.e. highly sensitive data

Note: Use Research Data JCU for completed data only | less than 100 MB - contact researchdata@jcu.edu.au for sensitive data or larger datasets.

Options that are suitable for backup storage include:

Shared university network drive (e.g. G, H etc.)
Desktop equipment (e.g. external drive/s, laptop, RAID systems, etc.)
External cloud storage/collaboration space (e.g., Dropbox, Google Drive).

Options that should NOT be used for storing research data include:

JCU High Performance Computing (HPC)
JCU research file shares - limited availability, based strictly on research need i.e temporary storage during analysis where the data needs to be local to the application for optimal performance (e.g. ArcGIS software and laptops)

While you may use many and varied different systems for data analysis during your research (e.g., JCU HPC, Metashape and Galaxy), these should only serve as temporary storage during the active analysis phase. They are not replacements for a long-term data storage solution. Never store your only copy of crucial data on these systems.

JCU HPC may have been used for long-term storage in the past; however, this practice is no longer recommended

For archiving completed data see Data Storage - Completed Data.

The basic rules for storing data and safeguarding against data loss are:

DO keep three copies in separate places i.e., on at least two different types of media (physical device or cloud) and in another location (physical location or cloud)
Ensure at least one copy is stored on a JCU approved option*

DON'T keep the only copy of your research data on a physical device e.g., hard drive (PC, laptop or external HDD) or USB key. These can easily be lost, damaged or fail.

The optimal combination of storage solutions will depend on your specific workflow, the volume and sensitivity of your data, and your preferences for file access and collaboration. For instance, you may prefer to work on your PC’s hard drive and synch to JCU Microsoft OneDrive if factors such as internet access, performance, or application compatibility are important. On the other hand, synching from OneDrive back to your PC (ensure you have sufficient space), facilitates collaboration and provides access to version history and a cloud backup if local storage fails. In practice, a combination of these approaches is likely to be helpful.

IMPORTANT: The following hypothetical research projects and storage options are provided for guidance only and are not prescriptive. To discuss a specific project and storage requirements in more detail please contact researchdata@jcu.edu.au

General

JCU Microsoft OneDrive*
Synchronised with hard drive on personal computer or laptop;
Backed up to an external hard drive or cloud service

Field-work based:
no internet access

Mobile device (tablet or laptop) for offline data collection in the field;
Copied to an external hard drive to create local backups; and
Synchronised with JCU Microsoft OneDrive* on return from the field

Computational analysis:

Hard drive on personal computer or laptop for day-to-day work and analysis (ideally synchronised with JCU Microsoft OneDrive*)
JCU HPRC for large-scale processing and simulations; and/or JCU QCIF Research Data Storage (QRISCloud)* for large datasets (>~50 GB) and collaboration; and
External hard drive(s) for local backup and portability

Sensitive data:

Dedicated JCU “R share” drive* for highly sensitive data and collaboration within the research team;
JCU Microsoft OneDrive* for remote access and external collaboration via link (non-identifiable data)
Encrypted external hard drive stored onsite for offline backup

Data Storage - Completed Data

The main or "master" copy used in the active stage of your research project will generally be the version of your research data that you archive as completed data. It is important to retain (refer to Data Retention) completed data for a minimum period to enable the validation of your research findings or (in some cases) reproduction of the research.

Although it is generally only necessary to retain records and data sufficient to support validation (or recordkeeping requirements) some data has long-term value and should be retained beyond the minimum period.

Consider the following criteria when making your decision about retaining your research data and information:

Uniqueness and non-replicability;
Reliability, integrity, and usability;
Relevance to a known research initiative or collection;
Community, cultural or historical value; and
Economic benefit.

Refer to Retention Rules for Specific Data Types for more information.

Note that long-term or permanent storage of large datasets may require additional funding, so you should consider this when making funding applications, etc.

Storing completed data in Research Data JCU:

Please create a Data Record and organise storage i.e.

attach files/zip files if your data is less than 100 MB in total and is not sensitive;
enter URLs for data held in discipline/funder or other trusted repositories e.g. NCBI, public GitHub sites;
email researchdata@jcu.edu.au to assist with larger, sensitive or hard copy data and documentation (e.g. paper survey responses and signed consent forms)

Data Visualisation

Data visualisation uses statistical graphs, plots, information graphics and other tools to create visual representations of data. The goal is to summarise and communicate data clearly, precisely and efficiently so that it might promote new insights.

There are many types of visualisations and thousands of tools available and they range greatly in complexity (e.g. from bar graphs to heat maps, networks, 3D models etc) and specificity.

The following are some of the more popular tools and training resources. Keep in mind that data visualisation tools are often used for exploratory data analysis and not just for displaying results. Some of these tools are designed to do both.

From Data to Viz leads you to the most appropriate graph for your data. It links to the code (R, Python, D3.js) to build it and lists common caveats you should avoid.
Datavisualisation.ch Selected Tools is a curated collection of tools that the people behind Datavisualisation.ch recommend. View the entire list or filter by function (maps, charts, data or colour) and whether you are willing to write any code.
Data Visualisation Catalogue is a library of different visualisation types and can be searched by function e.g. comparisons, hierarchy, processes & methods, analysing text etc) or viewed as a list. Each entry includes an example, explains how the visualisation is used and links to tools.
Visualising Data - Resources List's Categories (filters) include data handling, charting, programming, multivariate, mapping, web-based, specialist and colour.

Data Wrangling (Cleaning)

Data Wrangling or Data Cleaning is the process of identifying and correcting errors and/or making formatting more consistent. It’s often required to prepare data for analysis and/or visualisation, and (where appropriate) when publishing and sharing data. Data also needs to be cleaned before archiving. This will ensure that it’s preserved correctly, is not misinterpreted by other users, and facilitates interoperability (one of the FAIR Principles).

White et al (2013) published an excellent paper ‘Nine simple ways to make it easier to (re)use your data in Ideas in Ecology and Evolution. The authors noted that much of the shared data in ecology and evolutionary biology is not easily reused because they don't follow best practices in terms of data structure, metadata and licences.

Their nine specific recommendations are:

Share your data.
Provide metadata.
Provide an unprocessed form of the data.
Use standard data formats.
Use good null values.
Make it easy to combine your data with other datasets.
Perform basic quality control.
Use an established repository.
Use an established and liberal license

De-identifying Data

De-identifying data is the process used to prevent someone’s personal identity from being revealed. Data that has been de-identified no longer triggers the Privacy Act.

For example, data from the PALS (Pregnancy and Lifestyle Study) has been de-identified and is available for download. The risk of re-identification via triangulation has also been considered and managed.

Although the study contains highly sensitive data, several techniques have been used to de-identify the dataset e.g. identifiers and dates of birth have been removed, ages have been aggregated into bands - and postcodes have been excluded. It would be possible to re-identify (triangulate) participants by combining (for example) a rural postcode with an occupation.

Think about de-identifying your data early as it can be time consuming and difficult later. The Australian Research Data Commons (ARDC) has some tips on de-identification, listed below and in their Identifiable Data guide. You should also seek discipline-specific advice as required.

plan de-identification early in the research as part of your data management planning
make sure the consent process includes the accepted level of anonymity required and clearly states what may and may not be recorded, transcribed, or shared
retain original unedited versions of data for use within the research team and for preservation
create a de-identification log of all replacements, aggregations or removals made
store the log separately from the de-identified data files
identify replacements in text in a meaningful way, e.g. in transcribed interviews indicate replaced text with [brackets] or use XML markup tags
for qualitative data (such as transcribed interviews or survey textual answers), use pseudonyms or generic descriptors rather than blanking out information
digitally manipulate audio and image files to remove identifying information

Digital Object Identifier (DOI)

A Digital Object Identifier or DOI is a unique, persistent identifying number for a document published online. It appears on a document or in a bibliographic citation as an alphanumeric string of characters that links to the original digital object. The publisher assigns a DOI when a publication is made available electronically.

DOIs are not essential but are considered best practice for data citation.

Digital Object Identifier (DOI)

DOIs are not essential but are considered best practice for data citation.

DIKW Model

The DIKW Model or Pyramid shows the relationship between data and information and its eventual transformation into wisdom.

Each step moves towards a higher level – at the base (fourth) level data transforms into information (third level) by assigning a meaning or context. The moment the information is processed, linked and stored, whether by a machine or a human being, it becomes knowledge (second level) Lastly, the application of knowledge becomes wisdom (top level).

Documentation

Quality documentation and metadata enhances the discoverability of your data and enables others to accurately interpret, validate, reuse, and cite it.

Documentation is a comprehensive term that refers to any supporting material or information that provides context, explanation, or guidance regarding your research data. It includes a variety of components, such as codebooks, README files and, importantly, metadata.

Metadata or ‘data about data’ uses structured, standardized information to document data.

Supporting documentation e.g., codebooks, README files, code and scripts:

Data-level documentation is critical for interpreting, validating and re-using your data.

Please follow these guidelines if you are archiving our publishing your data via Research Data JCU.

At a minimum, ensure you include variable codes, labels, descriptions and units with your data (embedded) and/or in their own data file
Create and maintain codebooks, data dictionaries and README.txt files as required during your project and ensure you archive them with your data at completion.
Code and scripts used to derive or analyse your data should also be retained and published with your data as appropriate.

You can read more about study-level vs. data-level documentation and how documentation is stored below.

Metadata standards, schemas, classifications, vocabularies and ontologies:

Metadata in the form of standards, schemas, classification codes, vocabularies or ontologies may be relevant for your research project, particularly if you are depositing your data in a discipline specific repository. See our Repository Lists webpage for more information.

While these terms can be confusing, they all provide a structured framework for organizing and describing data. This ensures consistency, interoperability, and enhanced discoverability across domains and systems.

You can read more about machine vs. human-readable metadata and how metadata is stored below.

Here are some examples:

The Darwin Core metadata standard is an extension of Dublin Core (used for general resource description) specifically designed for biodiversity data. It is used by the Global Biodiversity Information Facility (GBIF) to aggregate and disseminate biodiversity data from various sources, promoting collaboration and advancing global insights into biodiversity. Darwin Core and Dublin Core are standards that include metadata schemas.

The ANZLIC metadata guidelines are widely used to document and describe spatial data in Australia and New Zealand. This profile of AS/NZS ISO 19115:2011 Geographic Information – Metadata has been retired (since 2015) in favour of the officially endorsed metadata standard AS/NZS ISO 19115.1:2015 Metadata (including the 2018 Amendment No.1). See the ANZLIC webpage for more information.

The Australian and New Zealand Standard Classification of Occupations (ANZSCO) is widely used to ensure consistency and comparability when dealing with occupational data. This is an example of a classification system that uses coding.

Controlled vocabularies make it easier for researchers to find or analyse data or to aggregate it with other data. There are literally thousands of vocabularies available in the research domain. Research Vocabularies Australia can help you locate, access and reuse vocabularies for your research project.

Gene ontology (GO) is used in bioinformatics and genomics. This ontology provides a structured vocabulary and standardized annotations to enable systematic and comprehensive analysis of gene functions across different species and biological contexts.

Additional reading

Study-level and data-level documentation:

Project or study-level metadata is often included in Research Data Management Plans (RDMPs) and provides a high-level overview and context for the data e.g. the research project’s aims, subject descriptions (keywords, FoR codes etc.), personnel, data collection and analysis methods, information about (IP) rights, access and plans for handling sensitive data.

In Research Data JCU some of this study-level metadata will auto-fill the Data Records and Data Publications that you create from your RDMPs. While this information is important for context, these metadata records must describe the data and not just the project or a publication.

You also need to provide supporting documentation at the data-level. This ensures your data is not misinterpreted and is critical for validating, reproducing and reusing your data. Some examples (from the UK Data Archive) include:

variable codes, labels, descriptions and units
reasons for missing values
weighting and grossing variables created
code and scripts used to derive data after collection (simple derivations such as grouping by age levels can be explained in variable and value labels)

Storing metadata and documentation:

Research Data JCU includes many of the metadata fields you will need to comprehensively document your data.

Documentation can be stored with the data (embedded) and/or included in their own data file e.g. codebooks, README files, scripts as supporting documentation.

Embedded documentation can be as simple as a key in a MS Excel spreadsheet or more complex e.g. for software packages such as R and Python libraries that include facilities for data annotation. If possible, export these as plain text and include them with your supporting documentation.

Machine and human-readable metadata:

Including some machine-readable metadata elements improves automation, and makes it easier for tools and systems to index, search, and analyse datasets efficiently. This is particularly important in large-scale data applications and systems.

In Research Data JCU, we use machine-readable Digital Object Identifiers (DOIs) to identify your Data Publications and to link them with your other outputs, and machine-readable licences. DOIs and licences allow other researchers to discover your work, attribute it properly and understand the terms under which it can be used – via Research Data JCU, Research Data Australia and other services that harvest the metadata.

Human-readable metadata and descriptions provides context and insights into the background, methodology and nuances of a dataset, and improves its interpretability.

Combining machine and human-readable metadata enhances metadata quality, the integrity and utility of your datasets, and supports FAIR data.

DOI Minting Services

JCU is a member of ANDS Cite My Data Service which allows Australian research organisations to mint DOIs for datasets so they can be easily cited.

DOIs can be minted from Research Data JCU under the following conditions:

The data is open (available by direct download) or can be made available via conditional access, i.e. The data must be available in some way in order to be citable (and to mint a DOI).
It hasn't had a DOI minted elsewhere, i.e. Research Data JCU must be the primary point of publication.

Data deposits can be reviewed and a DOI minted urgently if required for manuscript submission.

We can review your data deposit and mint a DOI urgently if you need this for a manuscript submission. Private links for peer reviewers (if data is embargoed) are also available on request.

DOIs for highly confidential or sensitive data where access is restricted will not be minted. However, making the metadata about your dataset public (not the data) allows other researchers to discover your work and collaborate with you in future. Importantly, your data securely stored and archived should it ever be challenged.

JCU researchers may also deposit metadata records in Research Data JCU to describe data held in other repositories (such as Dryad, GenBank or PANGAEA) and link to these datasets. This increases their visibility and ensures they are harvested by Research Data Australia and the Research Portfolio site. We cannot mint DOIs for these ‘secondary’ datasets.

Datasets without DOIs can still be cited and all datasets are harvested by Research Data Australia.

For further information, contact researchdata@jcu.edu.au