The challenges of building up China's biobank
Sign up now: Get insights on Asia's fast-moving developments

People line up for a Covid-19 test in Beijing on April 27, 2022.
PHOTO: EPA-EFE
Di Lihui
Follow topic:
BEIJING (CAIXIN GLOBAL) - With a population of 1.4 billion across 56 ethnic groups, China has an abundance of human genetic resources.
Thus, six years ago, when China proposed in the "13th Five-Year Plan on Science and Technological innovation" to "establish a prospective cohort of one million healthy people and patients with key diseases," there were high hopes that China would one day establish a biobank like UK Biobank and the "All of US" program in the US.
Since 2006, several regional million-person cohort studies have been funded and carried out as part of the Precision Medicine Research key project under China's Ministry of Science and Technology.
"The establishment of these million-person cohorts is groundbreaking for bio-medical research in China and provides an excellent basis for standardizing the collection and use of valuable clinical resources," said Guo Tiannan, assistant professor at the School of Life Sciences, Westlake University.
However, some experts who wish to remain anonymous have shared their various areas of concern.
Concerns over data remittance
The first concern is on data remittance. There are 20 national science data centers where data generated by major special projects need to be remitted according to the Measures for the Management of Scientific Data issued by the General Office of the State Council in March 2018.
One expert who has participated in several cohort closing reviews and did not wish to be named, said that research teams must commit to submitting the data, but which of these 20 centers to hand over this data is not stipulated. "It depends entirely on the convenience of the project leader," the expert said.
Although the data will most likely be handed over to the National Population Health Science Data Center under the Health and Wellness Commission, or the National Genome Science Data Center under the Chinese Academy of Sciences, there is the possibility that the data could end up in other centers on the list.
"In theory the data can end up being submitted to the National Seismic Science Data Center, however absurd it might sound," said the source.
Another expert suggested to designate only one data remittance center to avoid unnecessary and duplicated construction of a special genome data review and retrieval system.
Funding challenges
The second concern is about the correlation of cohort data. Cohort studies usually involve two types of data: one is genomic, proteomic and other genetic data, and the other is clinical, phenotypic and macroscopic data.
When the two are combined and correlated, they would drive meaningful results for scientific research and industrial development, such as new drugs. Cohorts lacking genetic data have limited practical use.
Unfortunately, most of the million-person cohorts only collect the second type of macroscopic phenotypic data. There's rarely genetic sequencing conducting after the collection blood and urine samples.
"Sequencing and histology are important, but they are costly too. The main reason (they are not done) is the lack of funding," the expert said.
Based on his calculation for the per capita funding of sub-topic groups under the "Precision Medicine" cohort, each participant is allocated between 100 yuan to 300 yuan (S$21 to S$63). "It's not even enough for proper baseline tests," he added.
As the expenses of processing and storage of high-quality biological samples is often seen as a financial burden by the project team, these biological samples and phenotype data obtained at great cost are often forgotten in dusty corners, instead of being shared, added and correlated as with the UK Biobank.
"Let's say one lab has got the genome, the transcriptome of 100,000 people, and then there are organizations putting in resources to measure the proteome. After some time, more investments pour in to measure the response to the drug use of this group, the progression of disease, changes of the cohorts' health status and so on. Over the years, the value of this data would snowball," Guo said.
"Relying on national funding and hospitals' input is not sustainable. Ideally, if we benchmark against the Europe and the US, there should be room for commercial entities, non-profit foundations and professional managers. This isn't just research, it's an opportunity for business. And for medical research to have a real impact for patients, there must commercial applications. This funding channel has not been realized to the fullest," added Guo.
Without much funding options, it is difficult for a research team to be maintained and members move on once a project is completed. "It is difficult to maintain the standard of data from different teams, there is a constant duplication of effort at the primary level," observed Zhang Li, director of the Genomics Center and HPC Core Facility at the Chinese Institute for Brain Research, Beijing.
Unlike other research projects, studies of large cohorts - for areas such as chronic diseases - often last for decades, with follow-ups every few years, and only then can the long-term effects of environmental factors and lifestyle habits be clarified. The longer the cohort study, the more valuable it is, but with it comes the challenge of sustained funding.
Even for high profile project like CKB, a corporate endowment of US$10 million (S$13.8 million) from three years ago is understood to remain suspended for disbursement due to the tightened control of human genetic resources in China.
Three years ago in 2019, the State Council, China's cabinet, issued the Regulations of the People's Republic of China on Human Genetic Resources Management, which took effect on July 1, 2019.
Barriers between physicians and researchers
Other than funding, there are administrative barriers are also making genome scientists worry. Several researchers said that as clinical and scientific research are separate entities, there is a lack of effective collaboration between the physician community and the research community.
Human genetic samples, especially those involving specific diseases, are mainly produced in hospitals, who are often reluctant to share them. As a sidenote, Rao Yi, a renowned neurobiologist, wrote in a blog post about how Chinese medical education can play a role in coordination.
New biobanks in China
With an increase in cohort studies, more Chinese biobanks have sprung up. From 2021 till now, 90 units have been granted permission to conserve bio samples, according to the Ministry of Science and Technology.
But, from a utility perspective, the samples have not always been placed in the right hands. "A lot of biobanks are managed manually. While the samples are there, nobody uses them. Ironically, the researchers who understand full well how they can use the samples for their work, aren't able to access the samples." Zhang Li said.
Herein lies the issue: Data collected from samples and clinics must be accessible, if we want to derive value such as driving the development of new drugs. But data-sharing is not attainable at the current stage because of competition, according to Zhang.
A more realistic approach would be information sharing on who owns what. Subsequently, the state departments should take initiative to coordinate hospitals and researchers, setting standards for the processing and sharing of bio-samples, Zhang suggested.
Next, the storage of data is the determining factor of data sharing. At present, traditional methods such as hard drives and downloading to the local server is becoming increasingly unsustainable, and cloud storage and cloud computing will become inevitable.
Zhang explained that the volume of data is staggering, with the entire genome of a single individual needing 100 gigabytes of data. It would be impossible for any organization to allow multiple downloads of the data at the same time - the cost of commercial network bandwidth would be prohibitively expensive, Zhang added.
It is no surprise that there is no unified data storage and sharing platform for the million-person cohorts in the "Precision Medicine" project. Even though million-person cohorts are up and running for six years and the number of patients and the total amount of data in China is comparable to that in UK and the US, few articles make important discoveries based on Chinese data, and few pharmaceutical companies come to pay for Chinese data.
"We need to join hands with relevant stakeholders with an open mind, tread a path with Chinese characteristics. Once our vast resources of dusty biological samples can be utilized, it will certainly bring about a radical upgrade to biomedicine and people's health in China and the world," Guo Tiannan said.
- This story was originally published by Caixin Global.

