The data lifecycle
More often than not, the lifespan of data extends well beyond the research project that generated them. Indeed, the researchers behind the project may wish to continue working with the data after the project has ended, or the research project funding has ceased. Other research colleagues may also want to reuse these data.
Furthermore, the function and value of the data are not the same at the beginning and end of a research project. They change from one phase to another during the data's lifecycle. This concept of the lifecycle of research data is important. It allows all the essential stages in managing your data to be anticipated.
The data lifecycle generally comprises 5 or 6 stages. At each stage of the data lifecycle, different actions are required for appropriate management and these are shown in the lifecycle diagram.
1/ Plan the research
This stage defines the research project and anticipates the next steps in the data lifecycle. It is at this stage that the needs and resources required (partnerships, financing, techniques, etc.) to carry out the project are identified. This step is primarily used to anticipate how data will be obtained and stored, to facilitate upstream traceability and enable data reuse. What data are collected, where, when, how, who, etc. are the main questions to be answered at this stage.
2/ Collect/create data
The data used can have several origins: they can be created, modified or reused. Experimental or observation data are collected according to the protocol predefined in step 1. It is also possible to reuse data already created by others, by requesting them from the authors or retrieving them from various open access data repositories.
3/ Organise and analyse data
Organising your data during the project is an important step, as it will facilitate the lifecycle management. This is essential to guarantee the identification, location, protection and access to these data, now and in the future, not only for the data owners, but also for other users who may wish to use them. Next, the data are processed (verification, validation and cleaning) and analysed using suitable methods (or tools) to answer the research question.
4/ Protect/store data
This stage consists in ensuring the security and safety of the processed data. Secure protection and regular backups are essential throughout the research project: ideally on different media according to the "3.2.1" rule: 3 copies on 2 different media and 1 off-site, such as a cloud. It would be a shame to lose the fruits of your labour.
5/ share and publish
In general, data sharing is primarily internal, between different departments within the same organisation, or between project partners. Once a project's data have been cleaned and stabilised, it is time to think about publishing them. Research data can be published via a disciplinary, institutional or more general repository, such as the national Recherche Data Gouv repository. It is recommended that these data sets are published in a secure repository that automatically generates a DOI (Digital Object Identifier). Certain restrictions may apply to the dissemination of data, particularly in the case of personal or sensitive data. The most important thing is to ask yourself the right questions, such as: what data should be shared or published? How? When? What license will be associated with the data? etc.
If in doubt, contact :donnnees-recherche@universite-paris-saclay.fr
6/ Reusing data
The data can be reused to validate the model or experiment. They can also be used in other scientific work to advance or test new hypotheses. In this case, the data begins a new cycle as raw data and the data lifecycle resumes.
Respecting these different stages in the data lifecycle contributes to opening up data in line with the FAIR principles.