Session 11: Sustainable Infrastructures: challenges and innovative approaches
Friday 7th July – 9:00 – 10:30
Chair: Sofie Wennström, Stockholm University Library, Sweden
11.1: It’s Not All About the Money: The Challenges of the Austrian Datahub to Become a Sustainable Open Access Service
Presenter: Patrick Danowski, Insitute of Science and Technology Austria, Austria
Since 2017 the “Austrian Transition to Open Access” (AT2OA) project aims to support the large-scale transformation of scientific publications from Closed to Open Access and to implement measures supporting this initiative. One key element of the second project phase is the establishment of a national hub for publication data that shall meet several requirements: to provide data for reliable OA monitoring, to support publisher negotiations and to generate added value by enriching the publication data made available.
For this purpose, the AT2OA² partner organizations – all Austrian Universities as well as two research institutions – will deliver the publication data from their institutional research information systems (CRIS) to the national data hub, using different supply methods, which vary from proprietary APIs, or OAI PMH to a simple upload via Excel file. This diverse approach allows us, to build a national community, involving libraries and other university services, with different levels of technical development status.
Subsequently, to the institutional data import, the CRIS data will be enriched with additional information (e.g., APCs, license and embargo information). For the data enrichment, interfaces to various databases and platforms, such as OpenAPC, DOAJ, Unpaywall, OpenAlex and SherpaRomeo, will be set up. A key feature of the data hub is a reproducible OA labelling of the publication data. To allow an automatic determination of the Open Access status, the process is realised with the Open Access Classification Tuple (COAT), an in-house development of the first project phase. The results (e.g., the national OA shares) will be displayed via a web tool that is currently available as a beta version. The institutions will be able to reuse the enriched data and apply the appropriate reporting and export options of the hub.
We plan a community-driven service that will be a sustainable one, running beyond the duration of the project. Therefore, we decided to take a more holistic view of sustainability and develop a concept which will take multiple aspects into account.
Beyond the costs for hosting, we need to address further challenges like:
- the code; its documentation and the knowledge transfer for further development;
- the sustainability of the used open services we rely on;
- the sustainability of our new services (e.g., the standardization of publishers);
- the individual responsibilities of the project partners – even after the end of the project
We’re convinced that the gain of value for the participating institutions, like the data enrichment, or the usability of the generated data for required internal and external reports, will contribute to ensuring the sustainability of the service. Embedding our service into a collaborative community will enable us as well, to transform it into a fixed component of the Austrian academic landscape.
During our lecture, we plan to highlight these challenges in sustainability. Based on the experiences we’ve made during the implementation process of our relatively small Open Access infrastructure, we’ll share our findings, which will hopefully serve as an inspiration for others faced with similar problems.
11.2: Collection and corpus: the case of the REAL repository
Presenter: András Holl, Library and Information Centre of the Hungarian Academy of Sciences, Hungary
REAL is the repository of the Hungarian Academy of Sciences operated by the Library of the Academy. It is the largest Open Access true repository (in the Open Archives Initiative sense) of Hungary, consisting mainly of scholarly literature written by Hungarian researchers or published in Hungary. (Other collections of the repository containing digitized manuscripts, old books and images are not discussed in this contribution.)
In a project recently launched in the Programme of the Academy entitled “Sciences for the Hungarian Language” first, a thematic language corpus will be compiled from the modern Hungarian language textual data collection within the REAL repository. The corpus, estimated to number at least a thousand million words, will be used for refining the AI language models (developed at the Hungarian Research Centre for Linguistics). The AI language models trained on the text corpus of the REAL repository will be leveraged to recognise and annotate standard elements typically found in scientific communications (title, authors, affiliation, keyword, abstract, acknowledgement, and bibliographic references) as well as named entities classified into standard categories (person, location, institution, etc.). Furthermore, an attempt will be made to classify the content of the documents into broad categories (topic modelling) as well. The results will be used for enriching and correcting the data and metadata in REAL, as well as expanding the content of the national CRIS system, the MTMT.
The authors will discuss the issues of corpus building from a digital library collection, the benefits expected from the repository perspective, and some possible connections with the Online Public Access Catalog of the library. Questions on the accessibility and the possibility of mining the text layer of the repository documents are also mentioned, together with the assessment of the data and metadata quality. The software used in processing the digital documents for ingestion into the corpus will be open-source.
The research project is expected to finish in 2026.
11.3: It takes a community: a participatory approach to sustaining an open infrastructure
Peter Kraker, Open Knowledge Maps, Austria
Chris Schubert, TU Vienna University Library, Austria
For many years, the market for academic discovery has been dominated by a few proprietary systems. In the shadows of these giants, however, an alternative discovery infrastructure has been created, built on thousands of public and private archives, repositories and aggregators, and championed by libraries, non-profit organisations and open-source software developers.
Unlike the commercial players, these systems make their (meta-)data openly available. As a result, the open infrastructure has become the strongest driver of innovation in discovery, enabling the quick development of a variety of discovery tools. Technologies such as semantic search, recommendation systems and visualisations are increasingly available to researchers, libraries and repositories alike.
The question then arises: how to sustain an infrastructure that is giving everything away for free?
In this contribution, we present the participatory approach of Open Knowledge Maps as an answer to this question. Open Knowledge Maps (https://openknowledgemaps.org) is a visual discovery infrastructure. All of the services offered by the non-profit organisation are for free, which includes its search and discovery services, its training activities, and its community support and engagement programs. In addition, data, source code, and content are distributed under an open licence, including extensive training materials.
Funding for Open Knowledge Maps is provided by projects and increasingly by supporting members. Supporting members are an integral part of the organisation: they become part of the governance and are directly involved in decision-making processes around the technical roadmap. With this governance system, the infrastructure enables co-creation, as it relinquishes control of the technical roadmap in order to achieve an equitable balancing of stakeholder needs.
One of the main outcomes of this participatory approach is a line of institutional services, the so-called Custom Services. With Custom Services, institutions can expand their discovery offerings with visual search components from Open Knowledge Maps. At the University of Vienna for example, Open Knowledge Maps was integrated into the data archive of AUSSDA – The Austrian Social Science Data Archive as part of a pilot project. The integration shows that the visual representation can also be used for research data. It gives an overview of the diverse topics in the AUSSDA Dataverse and offers simple and intuitive interaction options for zooming in and navigating to individual studies.
At the TU Wien Bibliothek, Open Knowledge Maps was integrated into the library catalogue “Catalog PLUS” as an advanced search option. Users can send their PRIMO query to Open Knowledge Maps as an external resource. Two visualisation types were integrated: “knowledge map”, which is a clustering according to similar terms, and “streamgraph”, which gives an overview of the development of topics over time.
In conclusion, this approach can create a win-win-win scenario for researchers, libraries, and the open infrastructure itself. However, a major challenge remains: to overcome the initial phase of attracting enough supporting members to provide basic funding for the maintenance and further development of the infrastructure. The authors will conclude the presentation with several approaches to address this challenge.