EUROPEAN RARE DISEASE REGISTRY
WHITE PAPER
Written by Riku Rinta-Jouppi (MA, MSc), Attorney-at-Law, Partner
Index:
Abundant, easily findable, accessible, interoperable and reusable (FAIR) data and evidence based medicine (EBM) are the corner stones of modern data-driven medical research. The rapid pace of technological progress (Moore’s law) has enabled the collection, governance and analysis of health data at scale. Carefully formulated research questions are being asked for clinical trials to be carried out where answers that can then be found and insights gained from the data. Medical research runs on data. Machine learning is only as good as the quality and quantity of data that gets in. This is a major problem in the fields of medicine that only have small data or no data at all. Some research fields are lucky to have oceans of data enabling wide scale use of the latest data-hungry research methods. Other research groups may be able to develop the best algorithms in the world but they are of no use if the relevant data is simply not available to them. Many healthcare institutions are struggling with data governance: how to best use the data they have in the best interests of the patients. Improved access to aggregated big data sets for research use is a human rights issue for rare disease patients, so essential have they become in informing ever faster and more accurate diagnosis and better treatment.
As ERDR will generate and make rare disease data more available for research, the data models informing diagnosis will become more accurate. The ability to search for data and recruit patients for further search from ERDR at European level will remove many of the barriers in access to data existing at the national level. Essentially ERDR will create a fully liquid and fair market for fit-for-purpose rare disease research data. This can be done without a loss of privacy as the data donors will be able to make their decisions on the level of access to data that they want to provide and they will be compensated for their efforts, perhaps not in fiat money, but by using the latest methods of state of the art token economics. A new economic model needs to be built to support the data altruistic rare disease research effort and giving back the individual patients and the patient community some new revenue streams.
Data is sometimes compared to gold due to its rarity and value. Yet, health data cannot be found in nature. It is generated by us humans in a specific format usually for some specific purpose. Different from gold, data as a digital asset is essentially fungible in the sense that it can be copied at zero cost and used repeatedly for a variety of purposes and used to build a wide variety of high added value data products. Value can for data can be created in particular in aggregation and integration over various data types and between different types of organisations and stakeholders.
In practice, health data can also found to be subject to various costs and liabilities due to the high regulatory and legal requirements placed on it in the European Union. Particularly personally identifiable information (PII) is to be minimised by design and by default as that it considered to be a particularly sensitive class of information. This has led to a “better safe than sorry” culture in public health institutions where data tends to be locked in fragmented, high-security silos with a very low rate of utilization for research use. Thus, the promise of data economy remains unrealised in the rare disease area. What ERDR sets out to do is to “break the silos”.
All the planned increase in data sharing and data donations require trust on the part of the data donor that their data will be protected from abuse. Still, Data breaches and cybersecurity attacks happen every day.
Why is ERDR needed? Because without big data the 95% of rare diseases without treatments and the rare diseases with unreliable diagnostic tests will have no hope if the data problem is not solved first.
Big data based analytics or “omics” require vast quantities of data in order to provide reliable results. In a study there can be even a million variables and the contribution of any single variable is very small. Multiomics refer to:
Big data based analytics or “omics” require vast quantities of data in order to provide reliable results. In a study there can be even a million variables and the contribution of any single variable is very small. Multiomics refer to:
Genomics
Transcriptomics
Proteomics
Metabolomics
Libidomics
Multivariate analytics are particularly important for multigene diseases. Even hundreds or thousands of gene variants can contribute to an individual’s genetic riskprofile. However, 70% of rare diseases are estimated to be based on a singular defect genetic: an addition, omission or deletion within a single gene. In rare diseases it is not necessary to have data sets of hundreds of thousands of participants with tens of thousands of patients as in some common diseases. We can do with less but not completely without data. Broad international co-operation is vital in collecting samples and in federating research results.
Finland has 11 biobanks that store over 500 000 blood or dna samples and millions of tissue samples. Biobanks have been established to serve university hospitals, universities, THL, Red Cross Blood Service and Terveystalo. The biobanks cooperate under the name FINBB.
Biobanks are an essential part of modern biomedical research. If the national research infrastructure is maintained well they are attractive partners for pharmaceutical companies looking for participants with particular biomarkers for clinical trials.
The UK Biobank is the first large scale internationally operating biobank that has been in operation for over 20 years already and storing samples for 500 000 UK citizens. The samples have been further processed to provide detailed genomic, proteomics and metabolomics insights. Individual level data is made available for researchers around the world enabling an extensive range of analysis. This spirit of open science and data sharing has been recently inhibited in the EU due to European and national data protection regulations.
“All of US” aims for collection of genomic and health data on US citizens. There is also the Million Veterans Program for the employees of US armed services.
Biobank Japan has collected hospital patient samples and health records from 260 000 Japanese citizens.
The Estonian Biobank is one of the pioneers in focusing on giving back actionable information to the donors and researching the processes and impacts of this feedback mechanism.
Almost all biobanks are publicly funded. The exception to the rule is the islandic deCODE genetics. The company has been a commercial success and has had an outsize influence in scientific progress having sequenced over a half of the population of Iceland.
The biggest recent Finnish project is the FinnGen-study. It is a cooperation between all Finnish biobanks, their founder universities and 13 international pharmaceutical companies. It is hosted by Helsinki University and within the university by FIMM. It is being funded by Business Finland. The study focuses on collecting 500 000 samples from Finnish citizens by the year 2023. The study aims for a better understanding of the genetic basis of diseases and their biological origins. It is foreseen that this this would lead to better diagnostics, treatment and prevention.
A particular advantage within the Nordic infrastructure are extensive health care registries where all the relevant health data from public health institutions is recorded. They store information from the cradle to grave. They have been instrumental in epidemiological research in the Nordic countries. The FinnGen-study is underpinned by integration of genetic data with this pre-existing registry data.
Of particular interest are the gene variants that alter the amino acid structure of the protein and so block the normal function of the protein. Some of these “broken genes” can have a protective effect and can be of interest in the development of novel therapies. If the defective gene does not seem to increase the risk of other diseases the hypotheses can be made that a molecule targeting that particular defect would not have serious side effects.
Biobanks form only a part of biomedical research infrastructure. Genome wide association studies can identify gene variants that are more prevalent with a particular patient group. The genetic association does not necessarily imply causation or reveal the biological mechanism through which the gene increases the disease risk. Some “broken gene” variants may even have a protective effect. To work out the biological mechanism a so called functional laboratory studies involving cell and animal models are required.
The biological mechanisms can be explored with additional samples from participating donors. Additional can be also blood or tissue samples but they require freezing as soon as possible after the sample has been taken. Biomarker levels are also very much dependent on the time of day, age and life circumstances. For the future, it would be important to collect samples from the same participants at different standardized time points.
Biobanks do not usually request samples from a particular patient community but rather from all voluntary participants and that is why the data sets will cover a multitude of diseases. This opens the possibility for the study of comorbidity i.e. which diseases go together. The basic Finngen-analyses cover over 2000 different diagnoses or combinations of diagnoses.
By combining biobank samples with registry data it is possible to carry out longitudinal studies where researchers repeatedly examine the same individuals to detect any changes that might occur over a period of time. Longitudinal studies pave the way for personalized medicine. The same individual can be contacted for additional life style surveys or further samples. This broad range of available data can give a more accurate picture on a particular person’s elevated or diminished disease risk profile.
As large volumes of data are being collected on citizens it is vital that the technical solutions to data protection and cyber security are up to the required standards and offer a high level of protection from a wide range of possible risks.
Biobanks work on the basis of biobank consents. The signatory gives their consent for the use of the samples for biomedical research and research and development use. Samples and dossiers are for the most part provided for research use in a pseudonymised format without names or other personally identifying information. The researcher will not have access to the identity of the sample donors.
Biobanks and registries can work together if the opportunities for cooperation are not lost in high levels of bureaucracy, over-complicated regulations and high costs. In the future the biobanks with a broad range of samples given by the same person at standardized time intervals will have the advantage.