DatASaclay : Research Data Management Cluster at Université Paris-Saclay
« Promote the FAIR principles and opening of research data produced within Université Paris-Saclay »Objective 2 of the Single Document for Open Science |
About us
DatASaclay is a consortium of experts responsible for helping the entire scientific community of the Université Paris-Saclay to manage and open up their research data.
The workshop offers support in managing the data lifecycle. Its aim is to make data FAIR : Findable, Accessible, Interoperable and Reusable.
Since October 2023, we are labelled as "Atelier de la donnée" within the ecosystem Recherche Data Gouv.
DatASaclay video :
The network
- Université Paris-Saclay and its four major schools : AgroParisTech, CentraleSupélec, ENS Paris-Saclay and IOGS;
- Two associate member universities : Université Évry Paris-Saclay and Université de Versailles Saint-Quentin-en-Yvelines;
- Five national research organisations : IHES, INRAE, CEA, INRIA, ONERA and CNRS;
- Two disciplinary structures : MSH Paris-Saclay and DIM/PAMIR.
Our services
What is a DMP ?
A Data Management Plan (DMP) is a document that explains the different stages in the data life cycle, indicating how data will be managed, from production to publication. The DMP helps researchers to organize and anticipate the various stages by asking themselves the right questions about the management of their data. It generally takes the form of a form organized into a series of questions. The answers to these questions make up the DMP, and help to explain what is planned for each dataset at each stage of the lifecycle.
The DMP addresses legal, ethical and budgetary issues, as well as aspects relating to responsibility and data security.
Why write a DMP ?
The DMP is a deliverable requested by funding bodies (ANR, Horizon Europe, etc.) as part of the evaluation of research projects.
It is a guarantee of research quality. It supports the FAIRization of data by facilitating data discovery, accessibility, interoperability and reuse. The aim of the DMP is to show that data is produced and managed according to "good practices", from collection to publication, within an "ethical and legal framework" in line with the FAIR principle (Easy to Find, Accessible, Interoperable and Reusable). However, it is important to note that the principle of "as open as possible, as closed as necessary" prevails: not all data is necessarily intended for publication.
Who can help me draw up a SMP ?
If you have any questions about drafting a PGD, please contact : donnees-recherche@universite-paris-saclay.fr (send us your PGD for expert review). Filling in your PGD : what questions should you ask yourself ? The SMP is an evolving document, with several versions during the course of a project. It can also be presented in different forms or templates. Some funding agencies offer their own DMP template (see template in DMP-OPIDoR). Whatever the model used, the DMP must address / specify the following aspects :
- Data description :
This requires a description of the data used in the project, whatever its origin. What data (type, nature, volume, etc.) will be produced or collected ? How will pre-existing data be re-used ? Which data are worth keeping for the long term ? Which data processing software (preferably open source)? - Documentation and quality of data and metadata:
Metadata is data that describes other data, and is an essential part of its architecture. It involves a detailed description of the data (context of acquisition, type of data, unit of measurement, file format, language, contributors, etc.). What procedures should be used to control data quality ? Which metadata standards should be used: disciplinary (Darwin Core, EML, etc.), general (Dublin Core, Datacite, etc.)? - Data storage and backup during the project :
Storage consists of depositing data on a digital medium to make it accessible. Backing up involves duplicating the data on a medium other than the one in which it is stored. How and where will data and metadata be stored and backed up during the research project? How will data be secured, and how will personal, sensitive or strategic data be protected? From the outset, it's essential to think about storage requirements (volume of data) and backup resources (available infrastructures). It's also important to anticipate the management of unforeseen events (computer system failure, virus, theft, loss, etc.) by planning secure data recovery methods. The 3-2-1 rule is highly recommended: 3 copies on 2 different media, including 1 on a remote location. - Ethical and legal aspects :
The content of a DMP must address the main legal and ethical issues raised by the processing of personal and sensitive data (voice, medical status, sexual orientation, etc.). How is personal data handled? How are aspects relating to data ownership or intellectual property handled? What is the relevant legislation? Is there a confidentiality clause? How will data be affected by ethical and deontological issues? How will data be processed (data anonymization, ethics commission approval official consent of data subjects, etc.) ? - Sharing and publishing :
Opening up data promotes the transparency and reproducibility of research work. The aim of this stage is to define how data will be exchanged between partners during the project. It also defines how data will be accessed, published and shared during and after the project. Questions to ask include: What data will be published (see decision aid here)? Are there any restrictions on data sharing, or reasons for embargoes? How and when will the data be shared? Define the data-sharing mechanism (on-demand or other process etc.). Which warehouse (disciplinary or general) should be chosen ? It is advisable to share data in a FAIR, non-commercial warehouse. - Responsibilities and resources :
This part consists of identifying all the people who will be responsible for managing the data, from production to publication, both during and after the project. Questions to ask include: What human, material and financial resources are in place to ensure proper data management? Who is/are the person(s) responsible for each data management activity ? What are the requirements in terms of expertise and training ?
Project leaders should make their scientific results accessible to the public as quickly as possible if they do not contravene intellectual property protection, security or legitimate interest rules. To do this, it is necessary to understand Open Science requirements throughout the project’s lifecycle.
Experts in the library, IT and Open Science departments alongside the Direction de la recherche et de la valorisation can provide support for drafting Call for Proposal submissions, and can also help winning projects in implementing these new requirements. Project teams receiving Open Science support may use the following services :
For Call for Proposal Submissions
- Support and advice for Open Science themes,
- Support in drafting a preliminary data management plan (for example, for European Cofund Calls for Proposals): Part B of Standard Application Form, p. 7),
- Support in drafting a strategic plan for Open Science practices (for example, for European RIA and IA Calls for Proposals): Part B of Standard Application Form, pp. 8–9),
- Support in drafting a plan for publishing and disseminating research results (for example, for European EIC Pathfinder Challenges Calls for Proposals), Part B of Standard Application Form, pp. 9–10).
Once the Project Has Begun
- Presentation of the ‘Open Science start-up kit’ during the kick-off meeting,
- Support in drafting and revising the data management plan,
- Support in making produced data FAIR.
At the end of the Project
- Support in disseminating data,
- Support for open-access publishing,
- Support in archiving data.
A data paper is a scientific article that presents a detailed and precise description of a dataset and the context in which it was produced. It informs the scientific community of the availability of the dataset deposited in a data warehouse. This new form of publication is being developed with the aim of opening up data. For example, Scientific Data, an open access journal created in 2014, publishes only data papers, referred to by the journal as Data Descriptor. These papers describe datasets in the life sciences, biomedicine and the environment, sending readers to links to warehouses where the data can be accessed. Data papers can be published either in traditional journals or in Data Journals.
DatASaclay helps you to write your data paper properly.
Depositing your data in a data warehouse makes it as visible, accessible and quotable as scientific publications. There are different types of data warehouse: thematic, multidisciplinary, institutional, specific to a publisher or a research project. For example:
- Zenodo, created by OpenAire and CERN to host research data from all disciplines;
- DataDryad, a warehouse created in the life sciences and now open to all disciplines, to host data associated with journal articles.
The choice of repository depends on the nature of the data, the research project for which it was produced and the objectives of the depositor. Beware: some repositories impose conditions on re-use that are not consistent with open access...
The guide Ouverture des données de recherche - Guide d'analyse du cadre juridique en France (V2 - December 2017), produced by an inter-agency working group led by INRAE and supported by the Committee for Open Science is a good approach to tackling the legal framework. This guide proposes a data communicability flowchart, which is the subject of a visualisation tool available online.
Publicly funded research data is public data, and the principle of openness applies to it. However, legal and ethical restrictions apply, particularly to personal data, sensitive data and data protected by copyright. On the other hand, the communication of data relating to professional secrets, national defence secrets, State security and public safety is prohibited as a matter of principle.
It is therefore necessary to remain vigilant about the data disseminated in certain situations, particularly when it concerns an establishment that produces sensitive data. In addition, ‘particular care should be taken when a scientific publication is involved and the publisher requires the data to be deposited in a specific warehouse (...) decisions to open up data are taken at the level of the institution and not at the level of the agent’. (Opening up research data - Guide to analysing the legal framework in France).