Skip to main content

AI's role in metadata enrichment for digital collections

AI's role in metadata enrichment for digital collections

AI's role in metadata enrichment for digital collections

News —

Jump to content

What we learned from bringing together some of the UK's leading cultural collection professionals to discuss AI's role in metadata enrichment. 

AI has a role to play in digital collections, but how can it be deployed responsibly, sustainably, and in a way that actually serves collections and the people who use them? We discussed these questions with collections leaders from across the UK in our recent roundtable event.

The UK’s museums, galleries, libraries and archives hold hundreds of millions of collection items between them. Many of these items have incomplete, inconsistent or entirely absent metadata. The sheer scale of the challenge means that traditional approaches to cataloguing and enrichment simply cannot keep pace. AI offers a way to close this gap. Not by replacing curatorial expertise, but by handling the repetitive, time-consuming tasks that prevent collections from being discoverable, accessible and useful. To understand how the sector is really approaching this challenge, we brought together digital collections leaders from some of the UK’s most important cultural institutions for a candid roundtable discussion on March 20th at the Museum of London.  

Representatives from the Imperial War Museum, the British Museum, the British Library, the Science Museum Group, the National Gallery, Royal Collection Trust, Art UK, Oxford University Museum of Natural History, Collections Trust and the Museum Data Service, gathered for two hours to share what’s working, what isn’t, and what the sector needs to move forward.  

What emerged was a conversation rich in hard-won practical insight and refreshingly honest about the barriers that remain. Here we’re sharing the key themes, and potential next steps for the sector to get the most from this new technology as it evolves.  

The messy reality of collection data

Before you can enrich metadata with AI, you need metadata to work with. The Museum Data Service, which ingests collection data from hundreds of UK museums, kicked off the conversation by explaining the challenge it faces. It has no minimum standard for the data it receives. Some museums provide richly structured records, others provide nothing more than an object name and a brief description. As Arran Rees, Museum Data Manager for the Museum Data Service, explained: “Sometimes we’ll have artwork coming through that’s got all the information, but it’s all in a brief description.” When downstream systems expect structured data, items with everything provided in prose simply fall through the cracks. It’s the problem of “semi-structured messy-data".

There is no nationwide standard for how even basic information like an artist’s name should be recorded. Different institutions use different spellings, different levels of detail, and different conventions for handling uncertainty. David Saywell, Director of Digital Assets at Art UK, highlighted the scale of the problem: around 95% of the collections Art UK brings together don’t have their own digital collection. Many are small galleries where the curator may also be the building manager, the person who puts on exhibitions, and the person who does the database entry. The sector needs to consider how these smaller museums and galleries can be helped, and whilst AI can play a role, it needs to be simple and open. Complex, expensive projects are impossible for smaller organisations.  

What's working

The Imperial War Museum’s oral history transcription project offered a compelling case study. The museum holds the world’s largest collection of oral histories: 48,000 audio files totalling 21,000 hours of content, none of it searchable. Transcribing it manually would have taken a single person twenty-two years (without sleep!). Using Gemini, the museum transcribed the entire collection in three weeks, with spot checks indicating around 95% accuracy, higher than a human transcriber would typically achieve.

Curators can now ask questions like “Was he ever scared?” and receive answers drawn directly from the transcript, with reasoning and references to specific passages. Perhaps most remarkably, the project surfaced content the museum’s own curators didn’t know they had, including a first-hand account of the famous 1914 Christmas truce football match. As Edward Kay, Head of Platforms of Digital Development at Imperial War Museums explained, the purpose isn’t to replace curatorial interpretation, it’s to make content findable so that curators can then do something with it.

Royal Collection Trust made a similarly pragmatic case for AI-generated alt text. Hannah Boulton, Communications and Audience Director, said: “without automating alt-text there is absolutely not a chance that we would ever be able to say that the website is compliant with accessibility guidelines. It just will not happen.” 

Start with the problem, not the technology

A recurring theme was the danger of treating AI as a solution in search of a problem.  

The British Museum’s experience illustrated this. Early thinking had focused on using AI to let audiences interact with objects in exciting new ways. But the work that’s actually proving most valuable is far less glamorous: small improvements to records that save people working with them an enormous amount of time. As Julia Stribblehill, Research Services Product Manager at the British Museum, reflected, people can interact with records in their own way if the information is there. The important thing is getting it to them.

Mia Ridge, Digital Curator at the British Library, captured this sentiment with a phrase that stuck: “I'm obsessed with the idea of tiny, boring tools”. The real value of AI in collections isn’t necessarily in shiny public-facing chatbots. It’s in small, focused tools that do one thing well, can be built into existing workflows, and can have their underlying models swapped out as the technology improves. Art UK provided a perfectly mundane example: an AI-trained solution that automatically crops colour calibration charts from collection images, freeing up significant volunteer time. Nothing flashy, but genuinely high impact for those teams.  

Trust, transparency and the language we use

Cultural institutions occupy a unique position in the information landscape. When a museum publishes something, audiences quite rightly expect it to be authoritative.  

The British Museum’s recent experience with AI-generated marketing images, which attracted significant public criticism, was raised as a cautionary tale, sitting at the intersection of broader public anxieties about AI and existing sensitivities about the museum’s history.

This highlighted an important distinction: generative AI carries far more reputational risk than machine learning applied to constrained tasks like entity extraction or image classification. Several participants felt the sector would benefit from being more precise in its language rather than using the catch-all term “AI.” There was broad agreement that transparency is essential. AI-generated content should be clearly labelled with the model used, the date it was produced, and ideally the confidence level. As Mia Ridge noted, telling a user something was generated by a specific version of an AI tool in 2026 gives them meaningful context if they’re reading it in 2028 and the technology has moved on. 

What the sector needs next

The roundtable surfaced several shared priorities worth pursuing together. First, there is a need for shared standards around how AI-generated metadata is recorded and quality-assured. The Collections Trust’s Spectrum standard already provides a framework for documenting collection activities, and participants saw an opportunity to extend this to cover AI-generated enrichment — recording which model was used, when, at what confidence level, and what human review took place.

Second, the case for shared infrastructure is compelling. The cost of running AI at scale is prohibitive for most institutions, and the skills required are scarce. A centralised service providing AI enrichment capabilities to the sector, accessible to institutions of all sizes, would be transformative. The Museum Data Service could be a natural candidate to play this role, and perhaps to act as an interface to the global-scale AI providers.

Third, the sector needs to share practical knowledge more effectively. Many institutions are running similar experiments in parallel, learning the same lessons, and arriving at similar conclusions independently. Sharing AI guidelines, training materials and governance frameworks would be an immediate, low-cost way to accelerate progress.

Finally, the conversation reinforced that the most successful AI projects in collections start with the problem, not the technology. Institutions that begin by asking their cataloguers and curators what slows them down are the ones producing work that sticks. AI should be “doing the dishes” so that the people with the expertise can focus on the work that truly matters. 

Numiko head designer and client director speaking to delegates at Great Digital for Good conference

Next steps

Let's chat about how AI could help with collection data, metadata enrichment or digital discoverability.