The 1000 Genomes Project
Next-generation DNA sequencers have advanced genomic research through more accurate, less costly and faster analyses, and enabled the development of novel applications. The technology will now be applied to a coordinated international effort to better understand the extent of human genetic variation and its relation to disease. The National Institutes of Health’s 1000 Genomes Project will build upon the HapMap project (see IBO 9/30/03) to map human genetic variants of common diseases with new levels of accuracy. Using next-generation sequencing technologies, the Wellcome Trust Sanger Institute, the Beijing Genomics Institute Shenzhen (BGI) and three members of the National Human Genome Research Institute’s (NHGRI) Large-Scale Sequencing Network—the Broad Institute of MIT and Harvard, Washington University of Medicine and Baylor College of Medicine—will sequence the genomes of 1,000 individuals, at a cost of between $30 million and $50 million, to discover the structural variants present in 1% or greater of the population and present at 0.5% or lower in genes. Current plans call for the use of 454 Life Sciences’ GS FLX system and Illumina’s Genome Analyzer. Asked about the possible use of Applied Biosystems’ SOLiD platform and Helicos Biosciences’ HeliScope system, Adam Felsenfeld, PhD, program director of the Large-Scale Sequencing Program at the NHGRI, told IBO, “as long as the quality, cost and throughput needs are met, the plan is agnostic about what platforms are used. The only instruments that appear to be able to meet the requirements along the given timeline, within the limits of our confidence and knowledge, are, right now, a combination of 454 and Solexa. SOLiD is promising as well. Because of this agnostic approach, if other technologies become available that can do better, we encourage their use.” The Project’s first phase will consist of three pilot projects. Two nuclear families will be sequenced at an average of 20 passes of each genome. This information will be used to determine how to identify sequence variants. The second pilot project will sequence 180 peoples’ genomes at an average of two passes per genome to access low-coverage data. The third pilot will sequence the coding region of 1,000 genes in around 1,000 people. This information will be used to develop plans for cataloging the coding regions known as exons. One of the project’s goals is to learn how to most effectively employ next-generation sequencing technologies to determine structural variants, such as regulations, deletions and duplications. “The test of this project for the new platforms is only for two specific applications, namely, identifying variants within their haplotype contexts to 1% throughout the genome and to 0.5% in exons,” said Dr. Felsenfeld. “The former requires large amounts of data that is of a consistent, predictable quality, cost and throughput, to the extent that the data produced by multiple groups can be combined for specific analysis. The latter requires integration of exon capture methods with the new technologies to become robust.” As a result, the Project is expected to advance the use of next-generation sequencing technologies. “Both [applications] will foster better understanding of instrument performance in true production mode, including a better understanding of data quality and quality requirements. In addition, both will foster the development of new tools for data analysis and deposition,” he commented. Among the specific technology challenges cited by a summary of a September 2007 workshop to plan the Project include common production and quality metrics, data quality and accuracy, use of paired-end reads, exon capture methods, determination of the frequency distribution of rare variants and information about the phasing and imputing of genotypes. According to the summary, the Sanger Institute will use 12 of its 25 Illumina Genome Analyzers for the project and BGI will use 5 of its 7 Genome Analyzers. Genotyping on Affymetrix and Illumina’s SNP microarrays will provide validation for sequenced data. When asked about new Project funding for instrument purchases, Dr. Felsenfeld explained that the Project is being funded by previously approved monies for the sequencing centers. He said, “funds are not allocated for specific machine purchase in advance for this project, though some centers may use available funds for additional instruments.”