Parallel Session 4 – Trustworthy Automation
Date: Wednesday, 1 July 2026, from 11:00 to 12:30
Moderator: TBC
Location: R5
4.1) Supporting Dataset Curation through Automation at KU Leuven
Presenter: Dieuwertje Bloemen, KU Leuven, Belgium
KU Leuven RDR is the CoreTrustSeal certified institutional data repository of KU Leuven, where curation plays an important role in data FAIRification and ensuring the quality of published datasets. The curation phase is not only crucial to have some quality control on the FAIRness of the data by ensuring correct metadata input, the presence of documentation and a choice of license, but to also ensure that the researchers are fully informed and supported in their efforts to publish their data.
After the repository’s launch in 2022, the monthly number of datasets published slowly increased overtime, and with that the number of dataset reviews to be carried out. As these numbers increased, it became clear that there was a need to better track the reviews and who is picking each review up, as well as a need to streamline this process in general. This would not only prevent unnecessary duplication of work, but would also potentially free up more time for support rather than the evaluation itself. To streamline the curation process, the RDR team developed an open-source review dashboard that plugs in to a Dataverse instance and automates different parts of the review process.
In the initial iteration of the dashboard, the automation focused on the administrative side of the reviews. For example, in the dashboard, reviewers can easily track who reviews what dataset, can add notes to any review and look back at the review history of said dataset. On top of that, the effort to streamline the feedback process resulted in the implementation of simple checklist in the review dashboard they can use to autogenerate feedback. This ensures uniformity in reviews, while still allowing for customizations, and prevents reviewers having to type the same feedback over and over again. This initial version of the dashboard was key to processing more datasets ready for publication and enabled reviewers to focus on the reviews themselves and not the administrative mess that previously came with it.
A second version of the review dashboard goes even a step further in its automation efforts. As the reviews were being carried out, some frequently made mistakes were flagged as having potential to be automatically found. With this idea an initial exploration began of what curation elements could all be automatically checked and how. From exploration, we found a lot of potential, such as indicating when a README file is likely missing, or when a README file is present, but empty. The list of potential automated checks was longer than expected and wereeasier to implement than we had anticipated. A bigger challenge, however, was to balance this automation with the human effort and input that is key in data curation. Some brainstorming on how to visualize this automation and how to always allow for human overwrites were necessary to ensure that the review supports human curation through automation and doesn’treplace it.
In this presentation, we’ll share our road to the creation of the review dashboard and a look at our UI, but also provide an insight into the logic of the automated checks. We hope to spark conversation on how to further support the human task of curation through tools and technology without losing the important human touch and interpretation that is so valuable to making a dataset as FAIR as possible.
4.2) Building Applications Without Code: How AI Enables Library Professionals to Develop Tools
Presenter: Piotr Krajewski, Gdańsk University of Technology, Poland
This presentation demonstrates the creation of DMP ART (Data Management Plan Assessment and Response Tool, https://github.com/gammaro85/DMP-ART) developed entirely without programming knowledge using artificial intelligence. The presentation shows the complete development journey from initial concept to working application, illustrating new possibilities that AI offers to library professionals and open science practitioners.
One of the main tasks of data stewards working at Polish universities is the assessment of Data Management Plans from National Science Centre grant applications. These plans arrive in various formats: text copied into DOCX, PDF files with entire proposals, screenshots, and other variations. This diversity makes systematic processing difficult. DMP ART automates extraction and structuring of these plans according to the 14-section Science Europe framework, provides template-based feedback system, and supports bilingual processing with OCR capabilities. While open for modifications, the application is heavily adapted to Polish realities, which is both an advantage and limitation. The application serves primarily as an example demonstrating new possibilities rather than being the main focus of this presentation.
The presentation primarily traces the creator’s journey from non-programmer to tool developer. The progression began with simple commands through browser-based service, evolved through iterative conversations, and culminated in professional development environment using Visual Studio Code with Claude Code agent and synchronized GitHub repository. This illustrates how cooperation techniques with AI evolved from basic conversations to sophisticated workflows, coinciding with rapid advancement in AI services. Concrete examples demonstrate how each stage built upon previous learning.
The primary aim is to illustrate emerging AI possibilities. DMP-ART serves as an initial instance of what this transformation might look like: library professionals with strong expertise in librarianship or open science could gain independence from programmer support. They might autonomously create tools for their daily work. This suggests a potential paradigm shift in how libraries tackle technological challenges – combining domain knowledge with AI assistance has the potential to transform information professionals into builders rather than merely adopters of technology solutions.
The presentation addresses critical risks associated with AI-assisted development: sharing sensitive information with AI systems, proliferation of applications with duplicate functionalities, sustainability concerns as projects may be abandoned without support, and lack of understanding about what shared code actually does. It discusses how libraries can balance opportunities with responsibilities and provides practical recommendations for institutions exploring AI-assisted tool development. The presentation offers frameworks for evaluating when to build custom solutions versus adopting existing tools. The development experience, concrete examples from the learning journey, and ethical considerations presented can serve as a model for other library professionals and institutions beginning to explore AI-enabled development approaches.
4.3) Advancing Digital Preservation of Research Data with EOSC EDEN’s Integrated Framework, Services, Guidelines, and Network
Presenter: Roxanne Wyns, KU Leuven, Belgium
Many digital data repositories and archives face growing preservation challenges, driven by diverse data formats, rapidly increasing data volumes, and the overall complexity of long-term data management. The introduction of FAIR, CARE, and TRUST principles, alongside the need for long-term digital preservation (LTDP), has major implications for digital repositories and archives. The EU-funded EOSC EDEN (2025–2027) [1] project aims to address these challenges by standardizing preservation and curation practices to ensure digital objects remain FAIR and usable over time. Several key achievements of EOSC EDEN were released in 2025 to benefit the communities.
Core Preservation Processes – It comprises 30 core preservation processes (CPPs), each representing a specific action a Trustworthy Digital Archive should take – whether directly or through its affiliates or service providers – to fulfil its digital preservation mission (as outlined in its preservation policy) [2]. These CPPs were developed by a group of digital preservation practitioners in EOSC EDEN, which focuses on the operational activities required for maintaining the authenticity, integrity, and usability of digital objects. Each CPP is expressed as a series of implementable steps that may be executed manually or automatically.
Expert Curation and Digital Preservation Network – EOSC EDEN is establishing a network representing repositories (both generalist and specialist), archives, and organizations responsible for research data curation and preservation. Through this network, clear roles and responsibilities for curation and digital preservation tasks will be defined. The first EOSC EDEN Curation Workshop [3], held in October 2025 in Leuven (Belgium), brought together over thirty European experts to exchange knowledge and lay the foundations for this network. By building a coordinated, multinational network, EOSC EDEN will strengthen knowledge sharing, harmonise approaches, and enhance the long-term sustainability of digital preservation efforts.
System requirements, repository attributes, and service specifications – EOSC EDEN consolidates system requirements, repository attributes, and service specifications to support trusted, interoperable, and scalable preservation infrastructures across scientific domains, in line with standards such as the Open Archival Information System (OAIS) [4]. The recent analysis [5] highlights persistent interoperability challenges – technical (e.g., format fragmentation, inconsistent APIs), semantic (e.g., varying metadata models and vocabularies), and organisational (e.g., divergent policies and workflows). The project working group further examines interoperability, research output quality assurance, and rights and ethics, supported by concrete examples. Together, these insights provide a solid basis for developing a structured inventory of services relevant to digital preservation and curation.
Discipline Requirements and Needs – EOSC EDEN investigates discipline-specific requirements, gaps, and emerging needs related to LTDP and digital object quality (DOQ) [6]. The initialinsights are provided from seven early-adopter disciplines: Climate Simulations, Earth & Environmental Sciences, Food Sciences, High-Energy Physics, Life Sciences & Bioinformatics, Linguistics, and Social Sciences. Many smaller, scholar-led repositories often lack well-documented LTDP and DOQ policies, leading to challenges in contextual, technical, and metadata quality. Across these domains, the most critical need is ensuring robust data documentation that supports both reliable preservation and sustained data utility.
Outcome of the presentation – The presentation will showcase how EOSC EDEN’s outcomes collectively advance long-term digital preservation and knowledge security for research data. Attendees will gain actionable insights into harmonizing preservation practices.
References
- https://cordis.europa.eu/project/id/101188015
- Lindlar, M., et al. (2025). Report on Identification of Core Preservation Processes (M1.1). Zenodo. https://doi.org/10.5281/zenodo.16992452
- https://eden-fidelis.eu/blog/1st-eosc-eden-curation-workshop-leuven-belgium-gathering-experts-build-european-curation-and
- Middelbos, W., et al. (2025). EOSC EDEN Specifications and Architecture (D2.1, Iteration 1). Zenodo. https://doi.org/10.5281/zenodo.17232536
- CCSDS. (2012). Reference Model for an Open Archival Information System (OAIS). ISO 14721. https://www.iso.org/standard/87471.html
- Andreassen, H. N., et al. (2025). Report on Discipline Requirements and Needs (D3.1). Zenodo. https://doi.org/10.5281/zenodo.15789261