A new consensus report, “Genomic Epidemiology Data Infrastructure Needs for SARS-CoV-2: Modernizing Pandemic Response Strategies,” from the National Academy of Sciences offers recommendations to address the SARS-CoV-2 pandemic through the integration of genomic, clinical and epidemiological data.
The main focus of the report is how viral sequencing is being utilized to address COVID-19 and the shortcomings and successes of these efforts so far. In particular, the report highlights inadequate sequencing capacity, data gathering and coordination, and capabilities. The achievements cited in the report are advancements in the field of genomics epidemiology, a host of current data gathering initiatives worldwide and how partnerships are contributing to the efforts.
In addressing the SARS-CoV-2 pandemic, sequencing data can be used to track disease outbreaks and prevalence, monitor changes in the genome and understand transmission, among other functions. Furthermore, it can be used to inform the development of diagnostic tests, vaccines, antiviral drugs and ongoing treatment.
Supporting these efforts is genomic epidemiology, which is defined in the report as “the use of pathogen genome sequencing to understand infectious disease transmission and epidemiology.” Integrating sequencing as well as clinical and epidemiological data, genomic epidemiology plays an important role in studying the origin, identity, evolution and mechanisms of a virus such as SARS-CoV-2. These data types have already been employed to address previous viral outbreaks, like SARS, Ebola, Zika and the flu.
Currently resources are insufficient to collect, coordinate and analyze sequencing data. Among the shortcomings are capacity. As the report states, “Expanding the global scope of genomic epidemiology as a practical method for timely and effective outbreak response will require building the technical capacities for genome” sequencing and analysis in public agencies and private facilities.” Other limitations include staffing, funding and data completeness.
“Current sources of SARS-CoV-2 genome sequence data, and current efforts to integrate these data with relevant epidemiological and clinical data, are patchy, typically passive, reactive, uncoordinated and underfunded in the United States.”
Also requiring improvement is data collection and coordination of both sequencing data and its integration with clinical and epidemiological data. For example, in the US, there is no central resource for the collection of SARS-CoV-2 sequencing data, such as data from different regions and lab types. In addition, data analysis is crippled by difficulties in sharing, collection practices and processes. As the report put it, “Current sources of SARS-CoV-2 genome sequence data, and current efforts to integrate these data with relevant epidemiological and clinical data, are patchy, typically passive, reactive, uncoordinated and underfunded in the United States.”
However, data gathering has been implemented, with an emphasis on both sequencing data and the integration of genomic, clinical and epidemiological data. Created in May by the US Centers for Disease Control and Prevention (CDC), the Sequencing for Public Health Emergency Response, Epidemiology and Surveillance (SPHERES) is an example. SPHERES is a program for sharing viral sequencing data across US-based clinical and public health labs, to assist in influenza surveillance including the coronavirus. Other US-based resources include the National Institute of Health’s National COVID Cohort Collaborative (N3C), which contains clinical data and lab methods.
In addition, in the US, a number regional of initiatives are underway, such as those spearheaded by the Broad Institute and the state of Massachusetts and the Chan Zuckerberg Biohub and the state of California. The work of each collaboration has already informed public health authorities.
On a global basis, data sharing efforts include the Global Initiative on Sharing All Influenza Data (GISAID), which encompasses genomic, clinical and epidemiological information and is already widely used for COVID research. Additionally, scientist utilze the Nextstrain project, which is an open source database of genomic data.
Some of the world’s largest corporation are also working in partnership with the public sector and academia to collect and analyze SARS-CoV-2 sequencing data. Among the companies participating in such ventures are Microsoft, Amazon and Amgen.
Based on its conclusions, the report makes three recommendations. These recommendations will no doubt have a role in guiding US government policy on sequencing SARS-CoV-2 going in the months ahead.
- Recommendation 1: The US Department of Health and Human Services shouldensure the generation of representative, high-quality full genome sequences of SARSCoV-2 across the United States, and in the future, from emerging epidemic or pandemic pathogens, in order that these data can be used to meet key needs for genomic surveillance.
- Recommendation 2: The US Department of Health and Human Services should develop and invest in a national data infrastructure system that constructively builds on existing programmatic infrastructure with the ability to accurately, efficiently, and safely link genomic data, clinical data, epidemiological data, and other relevant data across multiple sources critical to a public health response such as the current SARS-CoV-2 outbreak.
- Recommendation 3: The US Department of Health and Human Services should establish an effective and sustainable science-driven leadership and governance structure for the use of SARS-CoV-2 genome sequences in addressing critical national public health and basic science issues, develop a national strategy, and ensure the funding needed for successful execution of the strategy.
For information about the impact of COVID-19 on sales of sequencing instruments and consumables, please see the latest edition of the “SDi Global Assessment Report.”