Confidential health information from UK Biobank project leaked online
What's the story
A recent investigation by The Guardian has revealed that confidential health data from the UK Biobank project has been exposed online multiple times. The UK Biobank, a major medical research initiative, holds the health records of 500,000 British volunteers and is considered one of the world's largest repositories of health data. It has contributed significantly to research in cancer, dementia, and diabetes. However, concerns have been raised over how well patient records are being protected within this project.
Security concerns
Leaks caused by researchers with access to sensitive information
The data leaks appear to have been caused by researchers who were given access to Biobank's sensitive information. These files, while not containing names or addresses, still pose a threat to privacy. One dataset uncovered by The Guardian contained millions of hospital diagnoses and their dates for over 400,000 participants. This raises serious questions about the security measures in place for protecting such sensitive data.
Official stance
No identifying information shared with researchers, says UK Biobank
In response to the data exposure concerns, UK Biobank has maintained that no identifying information was shared with researchers. Professor Sir Rory Collins, the CEO of UK Biobank, said there is no evidence of any participant being re-identified by others. This statement comes as a reassurance amid growing worries over the security of personal health data in research projects like these.
Data repository
What is UK Biobank?
Established in 2003, UK Biobank is a massive repository of genome sequences, scans, blood samples, and lifestyle data from 500,000 volunteers. Last month, the government extended Biobank's access to volunteers' GP records. Scientists from universities and private companies worldwide can apply for access to this data. However, until late 2024, they were allowed to download it directly onto their own computer systems.
Platform misuse
Researchers unintentionally uploaded sensitive data onto GitHub
The data leaks problem stems from the fact that journals and funders increasingly require the researchers to publish their code for analyzing large datasets. In the process of doing so, some researchers have unintentionally uploaded parts or whole Biobank datasets onto GitHub, a popular online code-sharing platform. UK Biobank prohibits this practice and says it has implemented additional training for all researchers to prevent such incidents.
Legal measures
GitHub complied with requests to remove leaked data
Between July and December 2025, UK Biobank issued 80 legal notices to GitHub over these data leaks. The platform has complied with requests to remove the leaked data from the internet. However, a lot of it still remains accessible online. This highlights the ongoing struggle that UK Biobank faces in dealing with this issue and protecting its sensitive health data from further exposure.
Expert opinion
Expert shocked by level of detail in dataset
A data expert who reviewed the online dataset containing hospital diagnoses and associated diagnosis dates for about 413,000 participants was shocked by its detail. They said it was like a gross invasion of privacy even to glance at.
Re-identification risk
Test conducted by The Guardian on volunteers
To test the risk of re-identification, The Guardian approached several Biobank volunteers. One volunteer who offered treatment dates for a fracture and seizure could not be located in the dataset. However, another volunteer shared her month and year of birth, and the month and year she had a hysterectomy. Only one person in the dataset matched such details, corroborated by five other diagnoses from records that weren't initially disclosed.