PARADIGM

The last decade has seen astonishing advances in understanding the genetic causes of rare diseases. Many novel monogenic disease genes have been discovered and whole exome/genome sequencing (WES/WGS) is now becoming available across many health services, including the UK NHS Genomic Medicine Service (GMS).

However, more than half of rare disease patients remain undiagnosed even after WGS, and the vast majority of their genomic data remains unexplored. Failures in functional genome annotation are preventing robust variant classification, exploration of the non-coding genome, and discovery of new disease-causing loci, resulting in apparently gene-elusive rare diseases. Moreover, most monogenic disease mechanisms remain poorly understood, inhibiting the development of novel therapies.

The overarching research question we aim to answer is:
what is the genetic basis of gene-elusive rare disease?

We hypothesise that most unexplained rare disease is likely to be monogenic and caused by biological processes that are poorly captured by our current gene models. We will examine this hypothesis through a number of questions:

  • Can we develop machine learning approaches to improve disease-gene models?
  • Can we define active regulatory regions around clinically relevant genomic loci?
  • Can we detect novel, clinically relevant tissue-specific isoform expression?
  • Can we discover new non-canonical and non-coding causes of monogenic disease?

We will use machine learning and expert curation to provide new literature-derived disease models and tissue-specific gene expression maps. Focusing on two contrasting monogenic disease areas (paediatric developmental disorders and adult cardiomyopathies) we will generate long-read RNA sequencing data from foetal brain and adult heart samples to detect full-length transcript isoforms. We will then use these alongside other emerging datasets to find new causes of disease in existing patient cohorts through a combination of computational phenomics, novel pathogenic variant identification, and isoform-informed burden testing.

Figure 1

PARADIGM is a collaborative project that includes three complementary work that aim to:

  1. improve the map of functional genomic elements; use this map to perform
  2. genomic variant analyses to solve unsolved monogenic diseases; and
  3. disseminate annotated resources to the community.

Over the next decade, expansion in the breadth and depth of clinical and research sequencing will lead to an enormous increase in the amount of genomic data and number of candidate pathogenic variants detected. We will leverage these data, both in existing and ongoing initiatives, to catalyse biological discovery and enable clinical interpretation through integration, annotation and dissemination of Primary Annotated Resources to Advance Discovery In Genomic Medicine (PARADIGM).