RDIM Terminology

Terminology

This section provides an alphabetical listing of some of the terminology used in managing research data and information along with the meaning and/or application of these terms.

Click on the letter to see the definitions starting with that letter.

A C D E F H I J L M O P R S T V W

Data

Refer to Research Data.

Data Creator

Data Creators capture or create the data for the research project.

Due to current system limitations, internal and external data creators (including Other Data Creators) are dealt with in two ways:

  • For data creators internal to JCU:
    • A Data Creator is any JCU staff member or HDR candidate that has involvement in the creation of research data.
  • For data creators external to JCU:
    • An external data creator is referred to as a Collaborator.
Data Custodian

The Data Custodian is the person who has overall custody of and is entrusted with the research (data and information) assets throughout its lifecycle. Their overall responsibilities include:

  • Ensuring good records management practices are followed.
  • Managing and maintaining data and information, including metadata, to ensure that discovery mechanisms function.
  • Managing escalated risks related to research data and information through the University’s risk management processes.
  • Monitoring and responding to performance measures.
  • Prior to departure from JCU, the Data Custodian should ensure these responsibilities are assigned to an appropriate delegate to manage.
Data Manager

The Data Manager provides advice and or supervision to the researcher(s) throughout the research project lifecycle and reviews (spot checks) the veracity of the project’s data and information. Post project, the Data Manager is responsible for safeguarding the research (data and information) assets from unauthorised access and misuse, archiving and disposal.

Because responsibilities for data and information extend past the life of the research project (e.g. after the HDR candidate has graduated), the role of the Data Manager must be assumed by a JCU staff member. As such, the Data Manager is either the:

On departure of a Lead Investigator or Primary Advisor from JCU, the role will transfer to the:

  • For Colleges - Associate Dean of the Researcher (ADR) or appropriate delegate as determined by the Data Custodian.
  • For Institutes and Centres – Data Custodian can delegate to an appropriate Manager.
Data Package

A Data Package enables your research to be sustainable, facilitates reuse and allows replication of the study by other researchers who should be able to replicate your study independently and solely based on this information.

Your data package should be concise yet as complete as possible and include:

  • A README or instruction file which lists the files inside the package and explains their relation and includes a step-by-step instruction on how to use the files to replicate the study.
  • Raw data files. If your study is based on a portion of the original dataset, include only the necessary data. Make sure to include de-identified data in your data package and omit & personal and sensitive data.
  • Processed data files. In many cases, the raw data will be transformed to a processed format that is suitable for further analysis.
  • A data appendix/codebook which provides information about every variable in your dataset (e.g. variable name, value labels, the type and format of the variable).
  • Command files/syntax which includes code scripts that were used to transform the raw data into processed data and code scripts which were used to analyse the data and produce the results. The code should be accompanied by (inline) comments or other instructions needed for others to replicate your study. You should not include information or code in the package that you are not allowed to share (e.g. licensed software).
  • Protocols which were used during the study, for instance about the performed experiments.
  • Lab journals.
  • A reference to any publication which is based on the data.
  • Other metadata e.g., the parameters used in your study.
Data Papers

Data journals are publications whose primary purpose is to expose datasets; some more general journals also have sections dedicated to the publication of Data Papers. The publication of a Data Paper may be considered best practice for researchers whose primary output is data, or for whom the development of new, data-driven technologies is of particular importance. The publication of Data Papers includes an element of peer review, maximises opportunities for reuse and attracts academic accreditation for data scientists as well as front-line researchers (such papers may also be eligible for assessment as eligible research outputs under the Australian Research Council's Excellence in Research Australia (ERA) guidelines).

Data Publication

A data publication:

  • Is recorded in Research Data JCU.
  • Is a public metadata record of the data and information generated as part of the research project.
  • Includes detailed descriptions of the data (confidential or sensitive information about the data should not be included in the metadata record).
  • May involve issuing a Digital Object Identifier (DOI) (refer below) ensuring that your data can be correctly cited, attributed, interpreted and reused.
  • Establishes the conditions for access and reuse including the use of data licences (refer below);
    • The data associated with a Data Publication may be Open Access or shared via Conditional Access(i.e. access to data is negotiated via the Data Manager);
    • Metadata only records may also be created e.g. when access to data is restricted or data is held in another trusted repository. DOIs will not be assigned for these datasets.
  • Can be embargoed to allow for your research outputs to be published first.

You must have a Data Record in order to create one or more related Data Publications.

Data Publications are harvested by the Australian Research Data Commons, Google Datasearch, JCU’s Research Portfolio and other systems. This increases the visibility of your work and could attract future collaboration and funding opportunities.

Data Publications can also be imported (via Research Data Australia) into your ORCID profile. See this short video for instructions on importing Open Researcher and Contributor Identifier and Data. Refer to the Researcher Profiles, Identifiers and Engagement LibGuide for more information on ORCID.

You can make some of your data available (e.g. supporting documentation such as interview guides or codebooks - or a limited number of files) while restricting access to other files if you wish. Simply select the files you would like to include instead of selecting the 'Publish metadata only (no data)' option.

NOTE: Once submitted, Data Publications can only be edited by the reviewer (Data Librarian) and are read-only. The reviewer can update the Data Publications e.g., adding details for related publications that were pending when the Data Publication was deposited. However, significant changes cannot be made once the Data Publication is live and a DOI has been issued.

Data Record

A Data Record:

  • Is documented in Research Data JCU.
  • Is a non-public metadata record of the data and information associated with your research project.
  • Includes data attachments or the storage location of completed (not active) data.
  • Should also include any documentation necessary to understand or reproduce the research (such as survey questions, data dictionary, codebooks and R scripts).
    • Research Data JCU can be used to store data up to 100 MB. However, if data is greater than 100MB or is SENSITIVE, contact researchdata@jcu.edu.au to organise appropriate storage options.
  • Can apply to a large funded research project, a smaller or less formal project, a thesis, data chapters in a thesis or a dataset that will later be made available with a paper.

Completing a Data Record will:

  • Enable you to find data even after long periods of time have elapsed.
  • Ensure the integrity of your research methods and findings.
  • Satisfies requirements under the Australian Code for the Responsible Conduct of Research.
Data Repositories

The two key data repositories are:

  • Research Data JCU (via Research Data Australia) JCU datasets registered in Research Data Australia (most recent first)
  • Research Data Australia (RDA):  Australia's research data commons helps you find, access and reuse data for research from 100 Australian research organisations, government agencies and cultural institutions. RDA harvests data descriptions and links to data held with their data publishing partners. JCU has over 2,500 datasets in RDA

Data repositories - whether institutional, national, international, generalist, or discipline-specific - exist to support and facilitate long-term access to research data.

Research funders or journals may mandate data deposition in a particular repository. For example,  ’Most journals require DNA and amino acid sequences that are cited in articles be submitted to a public sequence repository (DDBJ/ENA/Genbank - INSDC) as part of the publication process.’ https://www.ncbi.nlm.nih.gov/genbank/submit/).

Many journals integrate data deposition in a generalist repository (e.g. Dryad) with the submission of manuscripts of related research publication.

For a list of other significant data repositories, including major generalist and subject-specific repositories, refer to Step 3: Archive page.

Data Retention

Preserving the data and information after your research project has been finalised is critical to:

  • Prevent data loss;
  • Enable long-term access, discovery and reuse; and
  • Ensure researchers and institutions can defend their research outcomes if they are challenged.

Preservation activities need to be planned and should take into account file formats and data quality, data ownership (refer to Copyright, Intellectual Property and Moral Rights), retention periods, preferred data repositories and ways to share data safely.

Retention rules are defined by the research funding body or the university. Key documents for JCU researchers include the guide 'Management of Data and Information in Research' which supports the 2018 Code and the University Sector Retention and Disposal Schedule for Queensland universities.

In general, the minimum period for retention of data is five years from the end of the year of publication of the last refereed publication or other form of public release to an audience outside of the University that is based on the data. However, in any particular case the period for which data should be retained should be determined by the specific type of research e.g. for areas such as gene therapy, research data must be retained permanently.

For more information refer to the retention rules for specific data types.

Data Storage - Active Data or Working Data

Appropriate data storage is one of the most critical aspects of good research data management. There are many circumstances that can lead to data loss with potentially devastating consequences for your research project, and your research career, and safeguarding against these should be a top priority.

Researchers may need different storage and collaboration solutions at different stages of the Research (Data and Information) Asset Lifecycle. The options listed under ‘Active Storage and Collaboration Options’ are suitable for storing; active (working) data, collaborating with other researchers, and/or creating backups.

For archiving completed data see Data Storage - Completed Data.

The basic rules for storing data and safeguarding against data loss are:

DO make three copies of the data and keep them in separate places

  1. A hard drive, external or on your computer;
  2. An external hard drive or cloud drive (JCU supports MS OneDrive and Teams);
  3. A network or other storage solution supported by JCU, e.g.  HPRC

DON'T keep the only copy of your research data on a hard drive, laptop, external drive, USB key, these devices do fail.

Compliant Data Storage Options include:

Non-Compliant Data Storage Options include:

  • Shared university network drive (e.g. G, H etc)
  • Personal equipment (e.g. external drive/s, own laptop, etc)
  • External cloud storage/collaboration space (e.g., Dropbox, Google Drive).

Non-compliant options should only be used for backups, never for primary storage

You will also need to be familiar with the:

For information on active storage and collaboration options, refer to the During Project Phase - Step 2:  Manage | Organise Data | Data Storage - Active or Working Data.

Data Storage - Completed Data

The master copy used in the active stage of your research project will generally be the version of your research data that you archive as completed data. It is important to retain completed data for a minimum period (refer to Data Retention) so as to enable the results of your research to be validated or (possibly) replicated.

Generally, it is only necessary to retain the records of your data sufficient to support validation and/or replication of results. (NB. Long-term or permanent storage of large data sets is likely to require additional funding, and this should be considered when making funding applications, etc.) It is also possible in some cases to store completed data in a public and trusted generalist or subject-specific data repository (such as GenBank or PANGAEA) -- in these cases you should still submit a Data Record in Research Data JCU to distinguish and facilitate access to your work.

Consider the following criteria when making your decision about retaining your research data and information:

  • Uniqueness and non-replicability;
  • Reliability, integrity, and usability;
  • Relevance to a known research initiative or collection;
  • Community, cultural or historical value; and
  • Economic benefit.

Please contact us via researchdata@jcu.edu.au to discuss storage of your completed data and we will advise you of the most complaint JCU storage option available.

Data Visualisation

Data visualisation uses statistical graphs, plots, information graphics and other tools to create visual representations of data. The goal is to summarise and communicate data clearly, precisely and efficiently so that it might promote new insights.

There are many types of visualisations and thousands of tools available and they range greatly in complexity (e.g. from bar graphs to heat maps, networks, 3D models etc) and specificity.

The following are some of the more popular tools and training resources. Keep in mind that data visualisation tools are often used for exploratory data analysis and not just for displaying results. Some of these tools are designed to do both.

  • From Data to Viz leads you to the most appropriate graph for your data. It links to the code (R, Python, D3.js) to build it and lists common caveats you should avoid.
  • Datavisualisation.ch Selected Tools is a curated collection of tools that the people behind Datavisualisation.ch recommend. View the entire list or filter by function (maps, charts, data or colour) and whether you are willing to write any code.
  • Data Visualisation Catalogue is a library of different visualisation types and can be searched by function e.g. comparisons, hierarchy, processes & methods, analysing text etc) or viewed as a list. Each entry includes an example, explains how the visualisation is used and links to tools.
  • Visualising Data - Resources List's Categories (filters) include data handling, charting, programming, multivariate, mapping, web-based, specialist and colour.
Data Wrangling (Cleaning)

Data Wrangling or Data Cleaning is the process of identifying and correcting errors and/or making formatting more consistent. It’s often required to prepare data for analysis and/or visualisation, and (where appropriate) when publishing and sharing data. Data also needs to be cleaned before archiving. This will ensure that it’s preserved correctly, is not misinterpreted by other users, and facilitates interoperability (one of the FAIR Principles).

White et al (2013) published an excellent paper ‘Nine simple ways to make it easier to (re)use your data in Ideas in Ecology and Evolution. The authors noted that much of the shared data in ecology and evolutionary biology is not easily reused because they don't follow best practices in terms of data structure, metadata and licences.

Their nine specific recommendations are:

  • Share your data.
  • Provide metadata.
  • Provide an unprocessed form of the data.
  • Use standard data formats.
  • Use good null values.
  • Make it easy to combine your data with other datasets.
  • Perform basic quality control.
  • Use an established repository.
  • Use an established and liberal license
De-identifying Data

De-identifying data is the process used to prevent someone’s personal identity from being revealed. Data that has been de-identified no longer triggers the Privacy Act.

Here is an example of sensitive data that has been published as open data. In this example, the risk of re-identification via triangulation has been considered and managed and the de-identified dataset can be downloaded from Research Data Australia.

Although the study contains highly sensitive data, several techniques have been used to de-identify the dataset e.g. identifiers and dates of birth have been removed, ages have been aggregated into bands - and postcodes have been excluded. It would be possible to re-identify (triangulate) participants by combining (for example) a rural postcode with a rare occupation.

Think about de-identifying your data early as it can be time consuming and difficult later. The Australian Research Data Commons (ARDC) has some tips on de-identification, listed below and in their Identifiable Data guide. You should also seek discipline-specific advice as required.

  • plan de-identification early in the research as part of your data management planning
  • make sure the consent process includes the accepted level of anonymity required and clearly states what may and may not be recorded, transcribed, or shared
  • retain original unedited versions of data for use within the research team and for preservation
  • create a de-identification log of all replacements, aggregations or removals made
  • store the log separately from the de-identified data files
  • identify replacements in text in a meaningful way, e.g. in transcribed interviews indicate replaced text with [brackets] or use XML markup tags
  • for qualitative data (such as transcribed interviews or survey textual answers), use pseudonyms or generic descriptors rather than blanking out information
  • digitally manipulate audio and image files to remove identifying information
Digital Object Identifier (DOI)

A Digital Object Identifier or DOI is a unique, persistent identifying number for a document published online. It appears on a document or in a bibliographic citation as an alphanumeric string of characters that links to the original digital object. The publisher assigns a DOI when a publication is made available electronically.

DOIs are not essential but are considered best practice for data citation.

Digital Object Identifier (DOI)

A Digital Object Identifier or DOI is a unique, persistent identifying number for a document published online. It appears on a document or in a bibliographic citation as an alphanumeric string of characters that links to the original digital object. The publisher assigns a DOI when a publication is made available electronically.

DOIs are not essential but are considered best practice for data citation.

DIKW Model

The DIKW Model or Pyramid shows the relationship between data and information and its eventual transformation into wisdom.

Each step moves towards a higher level – at the base (fourth) level data transforms into information (third level) by assigning a meaning or context. The moment the information is processed, linked and stored, whether by a machine or a human being, it becomes knowledge (second level) Lastly, the application of knowledge becomes wisdom (top level).

DOI Minting Services

JCU is a member of ANDS Cite My Data Service which allows Australian research organisations to mint DOIs for datasets so they can be easily cited.

DOIs can be minted from Research Data JCU under the following conditions:

  • The data is open (available by direct download) or can be made available via conditional access, i.e. The data must be available in some way in order to be citable (and to mint a DOI).
  • It hasn't had a DOI minted elsewhere, i.e. Research Data JCU must be the primary point of publication.

Data deposits can be reviewed and a DOI minted urgently if required for manuscript submission.

We can review your data deposit and mint a DOI urgently if you need this for a manuscript submission. Private links for peer reviewers (if data is embargoed) are also available on request.

DOIs for highly confidential or sensitive data where access is restricted will not be minted. However, making the metadata about your dataset public (not the data) allows other researchers to discover your work and collaborate with you in future. Importantly, your data securely stored and archived should it ever be challenged.

JCU researchers may also deposit metadata records in Research Data JCU to describe data held in other repositories (such as Dryad, GenBank or PANGAEA) and link to these datasets. This increases their visibility and ensures they are harvested by Research Data Australia and the Research Portfolio site. We cannot mint DOIs for these ‘secondary’ datasets.

Datasets without DOIs can still be cited and all datasets are harvested by Research Data Australia.

For further information, contact researchdata@jcu.edu.au