Session 6

Session 6: Data Management: Dealing with data challenges

Thursday 7 July – 9:45 – 11:15

Chair: Simone Kortekaas, Wageningen University & Research – Library, The Netherlands

6.1 French national data management policy: the key role of libraries in the strategic issue of data management, Cecile Swiatek, Université Paris Nanterre, France

Since the beginning of the health crisis, data has been at the heart of decision-making and the management of our actions for opening up scientific knowledge. The structuring, circulation and opening up of data mobilises all the professions in our ecosystem: research support staff, computer scientists, librarians and documentalists, legal experts, researchers. 
 
The European Commission’s report Cost of not having FAIR research data and the French report by MP Eric Bothorel on the opening up of national data show the crucial issues of data sovereignty and the economic issues linked to their management. The French Ministry of Higher Education, Research and Innovation has published a roadmap to serve a national policy on data, algorithms and source code. *As announced in its National Open Science Plan 2021-2024, France is opening “Recherche data gouv”, a national research data platform, in spring 2022. This national multidisciplinary platform has been conceived in close collaboration with a limited number of research institutions and organisations, including library services*. The ambition is to make it available to all researchers who have no trusted repository solution for their data and are using their publishers’ repositories, and to provide a customisable data repository for universities and research institutions. Fitting into an evolving international landscape, this initiative aims at avoiding the proliferation of insecure, non-sovereign, non-interoperable repositories that are far from FAIR practices. 
 
Libraries play an important role at two levels: in the design of “Recherche data gouv”, and in the implementation of “Data Workshops” (“Ateliers de la donnée”) – a single-window service collaboratively designed between institutions and their local partners, which aims to build research data management and curation skills for researchers. 
 
This paper presents how libraries participate in the process of designing the platform itself, leading to the implementation of “Recherche data gouv” in France, their added value, their expertise, their decisive contributions. It presents the place occupied by library representatives in the decision-making mechanisms and for each of the platform three components : data deposit and dissemination service (repository); catalogue of French research data which indicates the data deposited in national or international thematic and disciplinary repositories; data support services. 
 
Regarding training and data support services, this paper presents *how institutions have relied on their library services to respond to the national call for the implementation of “Data initiation, training and support workshops” (“Ateliers de la donnée”)* by and for partnering institutions in order to deepen the knowledge and build the skills of researchers and staff in terms of data management, aiming at the deposit of datasets on trusted repositories as well as their citability. This focus gives examples of libraries’ contributions in the design and coordination of services for the preparation and dissemination of data in proximity to researchers. It shows *how libraries participate in setting up collaboration and partnerships between institutions, how they are positioned in the research data management field, in the training of researchers and in the realisation of their DMP up to the data deposit*. 

6.2 Automating subject indexing at ZBW – the costs of the digital transformation and why we need less projects, Anna Kasprzik, ZBW – Leibniz Information Centre for Economics, Germany

Subject indexing, i.e., the enrichment of metadata records for textual resources with descriptors from a controlled vocabulary, is one of the core activities of libraries. Due to the proliferation of digital documents it is no longer possible to annotate every single document intellectually, which is why we need to explore the potentials of automation on every level. 
 
At ZBW the efforts to partially or completely automate the subject indexing process have started as early as 2000 with experiments involving external partners and commercial software. In 2014 the decision was made to start doing the necessary applied research in-house which was successfully implemented by establishing a PhD position. However, the prototypical machine learning solutions that they developed over the following years were yet to be integrated into productive operations at the library. Therefore in 2020 an additional position for a software engineer was established and a pilot phase was initiated (planned to last until 2024) with the goal to complete the transfer of our solutions into practice by building a suitable software architecture that allows for real-time subject indexing with our trained models and the integration thereof into the other metadata workflows at ZBW. 
 
In this talk we report on the milestones we have reached so far and on those that are yet to be reached on an operative level. We also discuss the challenges we were facing on a strategic level, the measures and resources (hardware, software, personnel) that were needed in order to be able to effect the transfer, and those that will be necessary in order to subsequently ensure the continued availability of the architecture and to enable a continuous development during running operations. 
 
We argue that in general, the format of “project” and the mindset that goes with it may not suffice to secure the commitment that an institution and its decision-makers and the library community as a whole will have to bring to the table in order to face the monumental task of the digital transformation and automatization in the long run. 

6.3 Responding to local data support challenges through a global student DataSquad, Deborah Wiltshire, GESIS Institute for Social Sciences, Germany, Paula Lackie, Carleton College, Tim Dennis, UCLA, Elizabeth Parke, University of Toronto

The demand for expert data management and manipulation in the research community has never been greater. Luckily, among our students, both at the undergraduate and graduate level, there is also a steady stream of budding data scientists eager for real data experience. Matching their enthusiasm with the campus-wide data wrangling needs is the challenge! 
 
From the 1990’s Carleton College (a 4-year liberal arts institution) has provided opportunities for their students to get real-life research experience and also to address the challenges in providing high quality data support services across the institution. By 2014, the DataSquad model was born. Since then, many students have joined the DataSquad, providing cutting edge data support services across the institution. 
 
The DataSquad model is designed simultaneously to assist libraries and data services in providing data support services. It also gives students practical data experience, giving them a valuable first step on their future careers as data professionals. In 2020 we formed the DataSquad International Working Group to look at how the DataSquad model could be adapted and expanded across different countries and settings. 
 
In early 2021 we conducted a survey of libraries and data archives across the globe, investigating the challenges that they face in delivering data support services with student labor and what barriers exist. We followed up with a lively panel discussion at the 2021 IASSIST Conference. 
 
In this presentation we will begin by discussing the initial findings of this survey, drawing out key trends in data support and barriers. Following the discussion of the survey findings, we will outline the early success of two work-in-progress initiatives currently running at Carleton College and UCLA, a sustainable start-up style programme based on the original DataSquad model. Consideration is also underway to look at whether the DataSquad model can be further adapted by institutions in Canada and in Data Archives who are also tasked with providing data support and face similar challenges to libraries. 

51st LIBER Annual Conference