RDIM Introduction The WHY - benefits of data management and sharing

The WHY - benefits of data management and sharing

Good data management and sharing your data can help maximise the efficiency and integrity of your research, and increase your visibility and impact as a researcher.

More and more funders, publishers and institutions are mandating data storing and sharing, so developing your skills in Research Data Management is essential.

Seven reasons why you should manage and share your research data:

If you publish your data it is discoverable and can be formally cited in research publications, and/or mentioned in news articles, or social media. Additionally:

There is evidence that making data available in a repository (such as Research Data JCU) can increase the citation rate for those publications (papers) that include DOI links to the data. This citation advantage (of up to 25.36% in one study) is higher than it is for data that is made available on request or included in the paper or supplementary materials. See e.g. Colavizza G, Hrynaszkiewicz I, Staden I, Whitaker K, McGillivray B (2020) The citation advantage of linking publications to research data. PLoS ONE 15(4): e0230416.
Data citations can be included in researcher profiles and curricula vitae (e.g. ORCID iD) alongside other research outputs such as journal articles or book chapters
Published data with a DOI can be counted and tracked to measure impact (e.g., via the Web of Science Data Citation Index or Altmetrics).

Making data available:

demonstrates the value of publicly funded research,
enables knowledge transfer and communication of discoveries to the public,
enhances citizen science and public engagement activities.

Research data can have real-life (and career enhancing) impact; the ANDS (Australian National Data Service) #dataimpact eBook brings together 16 stories collected during the #dataimpact campaign.

ANDS asked the research community to share stories about data-intensive projects that had national impact e.g. saved lives, protected our environment and wildlife, supporting the economy or influenced public policy. A JCU example is included in the eBook - Mapping the impact of climate change on Australian wildlife from the Centre for Tropical Biodiversity and Climate Change.

New digital tools for text/data mining, visualisation and collaboration help researchers to deal with the "data tsunami" - the explosive growth in size, complexity and data rate.
A culture of collaboration and data sharing is critical for data-intensive and cross-disciplinary research to meet major challenges as it enables:

new discoveries from existing data
integration of datasets for new analyses

Collaboration and data sharing reduce the duplication of research and demonstrate the value of publicly funded research.
It enhances the profile of researchers and may attract future funding opportunities.

Data management planning can inform your entire research activity, e.g. how the data is collected and managed:

Poor file naming practices and lack of version control waste research time and puts data at risk.
Choice of file formats and software influences how your data can be analysed, stored and potentially re-used in the future.
Systematic documentation and description of data (metadata) during the project saves time later.

Data management also helps you deal with the scope and scale of research projects - as they grow wider and bigger you need to ensure you have enough resources to cope.

Data sharing increases research efficiency by:

reducing the duplication of research,
reducing the burden on participants (for example by over sampling small populations or rare diseases), and
enabling faster science and higher quality data which could be critical, for example during health emergencies.

Data management activities such as documentation, version control and archiving (refer to Step 2 Manage) make it possible to:

validate published results, and
replicate or reproduce results.

Data management planning helps to ensure that the data collected is of high quality. Peer review of the data underpinning publications (or data papers) can improve the robustness of research results.

Reproducibility issues are discussed extensively in the literature. Take a look at the Retraction Watch faked-data archive if you have time.

Effective research data management ensures that data generated as part of the University’s research activities is registered, stored, made accessible for use and reuse (if appropriate), and managed over time according to legal, ethical, funder requirements and good practice.
JCU HDR students and researchers will need to comply with the JCU Code for the Responsible Conduct of Research (based on the National Code) as well as:

funder requirements, e.g. ARC, NHMRC
relevant privacy protocols and ethics obligations,
publisher requirements for making data available, e.g. PLoS, PeerJ, Nature Springer journals and many others,
requirements for Confirmation of Candidature and Thesis Submission (Higher Degree by Research students).

See the Governance and Compliance section.

Appropriate storage and back-up arrangements will protect data against (potentially devastating) loss.

Data management also prevents the unauthorised use of data by addressing confidentiality issues and ethical and legal (copyright, IP) compliance.

See the Data Storage - Active Data or Working data, Active Storage and Collaboration Options and Data Storage – Completed Data sections for more information.

Data Sharing: Overcoming the Barriers

Here are some common barriers to sharing data and some possible solutions:

Your dataset certainly may have value to future research! It is also very hard to anticipate what data may be sought after by future researchers. Even so-called "niche" data can be interesting or useful to others, including researchers from other disciplines (e.g. see the visualisations in this Nature news feature for an analysis of interdisciplinary research). The many datasets collected before "climate change" became a critical research field -- that have since become invaluable -- are an obvious example.

Providing good documentation and contextual information for your project and data will help other researches understand your data, and use it correctly. Publishing your data could be a good way to counter wilful misinterpretation of your data as you can quickly point to the real data on the web to refute this. If data are sensitive or likely to be misinterpreted you also have options for controlling access (see 'My data is too sensitive to share').

You have a competitive advantage in that you understand your data better than others - even with the best metadata descriptions. Other researchers should cite your data but if you are concerned about others analysing it before you publish you can often embargo your data pending publication(s). Metadata/data repositories such as Research Data JCU can assist you with this.

Sharing sensitive data can often be made possible with a combination of informed consent, anonymisation and controlling access to the data, as outlined in this website. Making anonymised data available via negotiated access can be a good option for sensitive data. It allows you to retain oversight e.g. you can make sure requestors are genuine researchers, that they will maintain confidentiality and security and you can discuss how they plan to use your data. You can also consider making some of your work public while restricting access to other data.

Ideally, you should seek permission from the IP owners early in the research project and/or use data that is licensed for re-use. Sometimes it can be difficult to tell where data has come from and this "taints" the whole dataset for sharing. If nobody really knows who owns the data try contacting who has management over the area the dataset belongs to and have them assign an owner or give permission. Making the data with clear ownership available while restricting other data can be an option although in some cases this will destroy the integrity of the dataset and it's re-use value.

This is valid concern, particularly when data is difficult and time-consuming to prepare, describe and/or share - and this varies across the disciplines. Planning and generating good documentation during the Research (Data and Information) Asset Lifecycle can help mitigate this. The eResearch Centre provides storage and data curation services through Research Data JCU and can assist.

The lack of reward for time invested in archiving and sharing data (see #6) is a recognised barrier to best practice Research Data Management. As Couture et al. (2018) suggest "personal incentives such as data citations should be more widely used to increase the impact of a particular dataset and provide recognition or credit for data creation." Assigning DOIs allows data to be tracked and cited in the same way as publications. See the DOIs and Data Citation section of this website for more information. As data citation becomes more routine citations may be incorporated into research evaluation and reward practices - see for example, the DORA (Declaration on Research Assessment). There can also be a citation advantage for publications associated with open data.

Adapted from:

Closed Data … Excuses, Excuses (blog post from the University of California Curation Centre (UC3) accessed 2 January 2018 and UK Data Archive‘s list of barriers and solutions to data sharing, available from the Digital Curation Centre‘s PDF, RDM for Librarians, pp. 14-15.)

Couture JL, Blake RE, McDonald G, Ward CL (2018) A funder-imposed data publication requirement seldom inspired data sharing. PLoS ONE 13(7): e0199789. https://doi.org/10.1371/journal.pone.0199789.

The WHY - benefits of data management and sharing

Seven reasons why you should manage and share your research data:

Visibility and Citation Advantage

Engagement and Impact

Collaboration and Innovation

Research Efficiency

Research Integrity

Compliance

Security

Data Sharing: Overcoming the Barriers

My data isn't useful to others

Other researchers won't understand my data and might misuse it

I want to use my data in a research paper

My data is too sensitive to share

My dataset includes data from other sources

I don't have the time or money to share my data

Lack of professional incentive