![](https://www.computerweekly.com/visuals/ComputerWeekly/Hero%20Images/banking-finance-charts-fintech-adobe_searchsitetablet_520X173.jpg)
Why the UK must lead on data to unlock AIs full potential
www.computerweekly.com
OpinionWhy the UK must lead on data to unlock AIs full potentialUnless the data silos in government are addressed, the UK risks falling short of the Action Plans ambitious goals to lead in AI adoptionByElena Simperl and Neil MajithiaPublished: 10 Feb 2025 The UK government holds some of the worlds most valuable datasets, including official statistics, cultural heritage records and NHS health data. These datasets have powered scientific breakthroughs, business innovation, and improvements in public services.With the publication of the much-anticipated AI Opportunities Action Plan, the transformative potential of government data for AI has never been more apparent. However, recent research by the Open Data Institute (ODI) reveals critical shortcomings in how government datasets are prepared and published for AI.Government data and AIs reliability challengeFoundation models (FMs), such as ChatGPT and Gemini, are increasingly used to provide information on public policies and services. Yet, the ODIs research highlights that while these models scrape government data repositories, they often fail to deliver accurate outputs based on them. Instead, models draw on secondary or unreliable sources, such as social media posts or opinion articles, or simply fabricate answers.The consequences are significant. Citizens using AI tools to understand benefit entitlements, for example, may receive misleading or incomplete advice, undermining public trust in both AI and government services. This is particularly concerning given the UK governments emphasis on improving public service delivery through AI innovation.Data deficits in the AI ecosystemThe AI Opportunities Action Plan, authored by Matt Clifford, rightly emphasises the role of the National Data Library (NDL) as a means to unlock government data for AI innovators. Yet, the current state of government datasets presents significant barriers to achieving this vision.ODI analysis of CommonCrawl, a key dataset repository for AI models, found that it scraped 13,556 pages from data.gov.uk as of April 2024. However, these pages rarely contributed to accurate model outputs. Across 195 test queries, models correctly referenced data.gov.uk statistics in only five cases.This issue arises because government data is often not published in AI-ready formats. While technologies such as DCAT are used to make datasets discoverable, scraping infrastructure like CommonCrawl does not fully support these technologies. As a result, AI models rely on less authoritative sources, perpetuating misinformation. The ODIs findings suggest that the UKs ambition to lead in AI innovation could falter unless this disconnect is addressed.Evidence from ODI experimentsThe ODI conducted two experiments to examine how government data supports AI models and, in turn, how AI models are enabled to support residents of the UK.The first experiment analysed how important UK government websites are for AI. Researchers conducted an ablation study utilising a machine unlearning technique to remove gov.uk websites from a selection of FMs' training data.The results revealed a 42.6% increase in models inaccuracy when deprived of gov.uk content, leading to fundamental errors. For example, one test found that models that did not have access to government websites misinformed users about their eligibility for Child Benefit.In contrast, the second experiment found that government datasets are currently unknown to AI models. This experiment, a study of models' ability to recall specific statistics from data.gov.uk, found that out of 195 queries, models accurately referenced official government statistics releases just five times.The conclusion from these experiments was that while government websites are vital for AI accuracy, government statistics datasets are underutilised despite their enormous value and potential in delivering public services. If we want to realise the potential of AI to deliver benefits such as improving care quality, safety, and cost-effectiveness in the NHS, the government must prioritise improving the quality, accessibility, and usability of its data.The path forwardThe adoption of FAIR principles - ensuring data is findable, accessible, interoperable, and reusable - has long been championed by data.gov.uk and remains a strong foundation. Emerging tools like Croissant, a machine-readable metadata format designed for machine learning, can further enhance discoverability and integration into developers workflows. If dataset descriptions are improved, they will be more usable for human and machine users.The government must incentivise responsible data sharing to ensure equitable access to high-quality data. This could include tax incentives for private-sector data sharing, mandates for publicly funded projects to make their data open where appropriate, or even a levy on AI-generated content to fund trusted information sources. We must use privacy-enhancing technologies such as Solid, which offer individuals direct access and control of their data - for example, their well-being and health data - to ensure access to sensitive data without compromising personal privacy, commercial sensitivity, or national security. This could provide important benefits, such as using machine learning to identify personal risk factors for health conditions, enabling preventative action. Data Trusts can be built on top of Solid to aggregate data. This aggregated data can be collated into datasets with Croissant metadata to prepare it for research use.Aligning with the Action PlanThe AI Opportunities Action Plans emphasis on high-quality data and strong governance aligns with the ODIs longstanding commitment to socio-technical solutions integrating advanced data infrastructure with public trust. To support the development of interoperable systems, AI-ready datasets, and privacy-enhancing technologies, the ODI is advocating for a ten-year National Data Infrastructure Roadmap. This roadmap would support the Action Plans focus on driving AI innovation through investing in long-term data infrastructure.Read more about the AI Opportunities Action PlanMajor obstacles facing Labours AI opportunity action plan: Skills, data held in legacy tech and a lack of leadership are among the areas discussed during a recent Public Accounts Committee session.Disjointed industrial strategy a barrier to UK scaleup success: The House of Lords Communications and Digital Committee calls on Labour to join-up piecemeal initiatives and cut bureaucracy.However, the Action Plan leaves several gaps unaddressed. It does not fully detail how the National Data Library will incorporate user input or engage diverse stakeholders to ensure it delivers public benefit. There is limited detail about formal standards for data quality and provenance, which are critical for ensuring AI-ready datasets. Furthermore, while the Action Plan highlights the need to support AI innovators, it could more explicitly foster data-centric startups specialising in data preparation and governance tools. We hope these gaps are addressed as the government rolls out the recommendations.International leadership through collaborationThe ODIs research highlights the global importance of data-centric approaches to AI governance. However, few nations prioritise this focus, risking undermining the broader adoption of open and shared data practices. Without robust data-centric governance, the foundations of transparent and accountable AI systems could weaken.The ODI has launched the Global AI Policy Data Observatory to address this. This initiative provides practical resources to support policymakers in developing data-centric AI governance. By offering insights into machine-readable metadata, toolkits for responsible data use, and best practices for transparency, the Observatory aims to strengthen the global evidence base for data-centric AI.Realising the UKs AI potentialAccess to high-quality government data is essential for realising AIs potential in public service delivery. By improving data publication practices and investing in long-term infrastructure, the UK can position itself as a global leader in data provision for AI. This leadership will unlock transformative economic and social benefits, aligning with the ambitions of the AI Opportunities Action Plan.The full report is available to download at ODI Report: The UK Government as a Data Provider for AI.Elena Simperl is the director of research at the ODI. Neil Majithia is a researcher at the ODI.In The Current Issue:Forrester: Why digitisation needs strong data engineering skillsLabours first digital government strategy: Is it dj vu or something new?Download Current IssueSLM series: Editorial brief & scope CW Developer NetworkWill Skills England be allowed to change the course of the Government's inherited policy Titanic? When IT Meets PoliticsView All Blogs
0 Comments
·0 Shares
·47 Views