Parallel Session 8

Parallel Session 8 – AI and Collection Development 

Date: Thursday, 2 July 2026, from 09:00 to 10:30

Moderator: TBC
Location: R8

8.1) SubjectSeeker AI: No-Code AI for Subject Matching Rare Books at KU Leuven Libraries Special Collections

Presenter: An Smets, KU Leuven Libraries, Belgium 

KU Leuven Libraries’ Special Collections manages a unique collection of approximately 50.000 early printed books. While accurate subject assignment is essential for the discoverability and scholarly use of the collection, categorising this large dataset of works is resource-intensive when done manually. This paper details the development and impact of SubjectSeeker AI, a no-code AI Agent for subject matching rare books. It compares SubjectSeekers’s performance to that of other AI solutions and shares practical insights for other institutions looking for accessible ways to use AI to enhance their library collections. 

SubjectSeeker AI was not the first experiment with AI at KU Leuven Libraries Special Collections. Already in 2020, we started exploring AI-based methods to support the process of subject matching with the help of a company that specializes in strategies for digital heritage and data services. An initial AI model, trained on preset rules and title information, aimed to predict relevant subjects. However, the model’s effectiveness was limited: the often unique, multilingual, and archaically spelt titles posed challenges for accurate matching, resulting in substantial time spent on corrections. 

The emergence of ChatGPT and other Large Language Models (LLMs) with direct access to internet data marked a turning point. Quick experiments revealed that simply providing title and author information to ChatGPT produced better results than our previous professional AI model. Importantly, adopting a no-code AI solution empowered staff members to perform subject matching themselves, eliminating the need to rely on external companies for both development and execution. Building on these insights, we collaborated with KU Leuven Libraries’ AI Librarian to develop and test SubjectSeeker AI. 

Our process involved developing, testing, and implementing the AI agent, with iterative adjustments based on initial results. As KU Leuven has a contractual agreement with Microsoft Copilot, ensuring that inputs and outputs remain private and cannot be used to improve the model further, the agent was developed with a M365 Copilot license. Throughout the process, we leveraged supporting documents such as the complete subject list, previously matched titles from the database, and in-house matching guides based on staff expertise – including lists of authors and keywords linked to specific subjects, as well as book codes mapped to the subject list. Our subject list is based on the STCV project (The Bibliography of the Hand Press Book in Flanders, stcv.be/en), an online database managed by the Flemish Heritage Libraries, which is enriched with a controlled vocabulary for subject keywords. SubjectSeeker AI demonstrated notable improvements in matching appropriate subjects for rare books, outperforming our earlier AI model in both accuracy and usability. 

This paper offers a comprehensive overview of our process, covering the development, testing, and implementation of the AI Agents, along with the adjustments made to our initial plans. We will present the SubjectSeeker AI’s performance in matching relevant subjects, compare it with our previous AI solution, and showcase our agent’s instructional prompt. Since our goal is not only to describe our specific use case but also to demonstrate how other institutions can apply AI Agents for topic matching, we will conclude with lessons learned and recommendations for library institutions. 

 

8.2) What is the Impact of AI on Digital Collections? Lessons Learned from a Research Project in Oxford 

Presenter: Megan Gooch, University of Oxford, United Kingdom 

In this paper I will present work undertaken by myself and colleagues as part of the Bodleian Libraries, University of Oxford’s research project funded by AI company OpenAI. The overall research aim was to understand the scope and scale of the Bodleian’s collections suitable for digitisation at scale. To reach this aim we undertook experiments in several areas to understand which items in our collection could be digitised, how this digitisation could be achieved, and looked more widely at the Libraries sector to see whether peers in other institutions had solved some of the common challenges. 

To understand the volume of collection suitable for digitisation we adapted a museums collection audit methodology (Dun and Das, 2012) and worked with subject librarians across our many library sites, with data specialists and our special collections curators. Due to the limitations of copyright, conservator and curator hesitancy around digitising books predating 1800, and books already committed to other digitisation projects, we concluded the number of possible digitizable objects was about 250,000. 

The next step was to understand how books could be digitised at scale. To do this we scaled up digitisation in our existing imaging studio and starting a new digitisation facility at our book storage facility. At both we tested new equipment and workflows and employed new staff. From these two experiments we learned that whilst there were ways we could raise imaging throughput, there were trade-offs that had to be made, such as in the levels of quality assurance, and image size, to ensure digitisation was able to be scaled across the workflow. 

However, imaging alone is not sufficient to digitise an object. The biggest cost and time involved in digitisation is in cataloguing and resource description. We systematically tested different approaches to metadata creation and augmentation from full texts, title pages and card indexes. Our conclusions were promising: smaller (and therefore environmentally and financially cheaper) AI models performed better, and we also concluded that the role of the expert librarian, curator or cataloguer was more important than ever in the age of AI. We reject the term humans in the loop, and have started using AI in the loop – as computation is part of our workflow, not us in theirs. 

Finally, we created and circulated a survey to ascertain the state of AI implementation in libraries, and received over 400 responses from all inhabited continents, including many European research libraries. We are in the early phases of analysis of the data but can see libraries are using AI most in user-facing chatbots and in metadata augmentation. We also uncovered some strong sector attitudes to AI, both positive and negative, that we wish to share with LIBER colleagues. 

We will conclude that whilst AI experimentation formed a relatively small part of our research on AI in libraries – the impact of AI is what drove the project, and many conversations and new projects around the wider implications of AI on creating, using, publishing and preserving our digital collections. The new context of AI drove the need for us to understand both the demand for digital collections from AI companies and other users, as well as the processes that we would need to undertake as a library to meet these emerging needs to access our wealth of content as digital collections. 

 

8.3) AI and Archives: Utilizing AI to Make Retroactive Finding Aid Conversion Simpler 

Presenters: Alyssa Hyduk, University of Regina, Canada 

The University of Regina Archives and Special Collections has been working over the past few years to establish a database for its archival holdings which provides a better and more user-friendly archival experience. AtoM, or Access to Memory, is an open-sourced software which uses archival descriptive standards to code and catalogue archival collections, and requires a specific template, in CSV or XML format, in order to work. 

While this is, in essence, not an issue for processing and cataloguing archival collections on a go-forward basis, the UofR Archives also has over 2000 finding aids which have been created in Word/PDF format and are not conducive to AtoM’s mandatory upload formats. 

With budget and staffing constraints, an innovative and flexible approach to converting these finding aids was necessary to ensure full collection representation to researchers. Using Gemini and Google AI Studio, we have created a method to take these old finding aids and convert them into workable CSV/XML documents, with appropriate metadata mapped correctly. 

This presentation aims to provide the audience with a demonstration of this AI method, with the aim of providing the following learning outcomes: 

1: Ability to identify, based on the home institution’s cataloguing needs, the appropriate metadata fields necessary for generating a working AI prompt compatible with the institution’s database. 

2: Generation of a working AI prompt which is flexible and customizable. 

3: Techniques for reviewing and correcting data outputs, including editing prompts for improved accuracy. 

Like all other tools at our disposal, AI is not a blanket fix, but a method to maximize limited institutional resources. Instead of staff spending hundreds of hours converting finding aids “by hand” into the acceptable format, AI can be used to drastically reduce the hours needed, allowing materials to be accessible and searchable by researchers. Human intervention is still required for accuracy and intervention as well as creating standardized language for AI use, however, if used effectively and responsibly, AI can be a powerful tool in the information professional’s toolbox. 

55th LIBER Annual Conference