NGS-based Dataset Continues NIST’s Efforts to Support the Forensic DNA Community

Last year, the National Institute of Standards and Technology (NIST) announced that it had developed a “statistical foundation” for analyzing match statistics with NGS, leading to more detailed DNA profiles to help solve crimes. In order to produce a conventional DNA profile, labs examine genetic markers called short tandem repeats (STRs), and by counting the number of repeats within a marker, scientists are able to create an identity profile.

While profiles were traditionally developed with STR-based analysis, NGS may prove to be a cost effective alternative that can deliver much more data than STR-based profiles. Generally, STR-based profiles provide adequate information that can lead to a suspect; however, in cases where DNA evidence is limited or has deteriorated, the additional data in an NGS-based profile can potentially serve as the key factor in solving a case.

NIST’s study involved generating NGS-based profiles by sequencing 27 markers from a DNA sample library of 1,036 people, and then calculating the genetic-sequence frequencies that were found at each marker, resulting in an NGS-based dataset for calculating match statistics. Katherine Gettings, PhD, part of the Applied Genetics Group at NIST, has been working in the human identity testing arena since 1998, and was the lead NIST biologist of the study. IBO had the opportunity to discuss the statistical dataset with Dr. Gettings, as well as the value of NGS in DNA analysis and the next steps for the project.

“Historically, STR-based DNA profiles have not been sequenced at all,” Dr. Gettings explained. “They were (and in most labs still are) generated via multiplex PCR with fluorescently labeled tags, and then the lengths of the copies are determined by capillary electrophoresis, which are then compared to a ladder of known DNA types.” While traditional sequencing methods, such as Sanger, have not been frequently been used for STR marker analysis, they may sometimes help provide insight about irregular test results. “Sanger sequencing has been used rarely to investigate unusual results in a research setting, such as when a DNA type did not match up to the ladder,” she continued. “Sanger is very time consuming for STR markers, because PCR must be performed for each marker individually, and additional steps must be taken to separate the two DNA types a person typically has at each marker (one from each parent). My colleagues at NIST are quite skilled at this, but it could never be widely implemented.”

“NGS opens the door for additional regions to be targeted and analyzed in a single workflow.”

Consequently, NGS can serve as a valuable tool in DNA analysis. “NGS may produce a more discriminating statistic in some cases (a higher match probability), which could be useful in complex mixtures,” said Dr. Gettings. “NGS also opens the door for additional regions to be targeted and analyzed in a single workflow, such as single nucleotide polymorphisms (SNPs), which can provide information on the likely physical traits of the sample donor, and which may outperform STRs in severely degraded samples.”

Adequate informatics capabilities are required for the analysis of STRs, and current databases have the means to store the analyzed information. “In contrast to some of our field’s early technology transitions, when we sequence STRs, we are still able to convert the sequence back to the length-based alleles that are currently held in databases such as the FBI’s National DNA Index System (NDIS, which holds convicted offender, arrestee and forensic unknown profiles), so we can continue to use these existing databases,” said Dr. Gettings.

NIST has also established a collaborative project to ensure the quality of STR data. “At NIST, we are supporting the quality control of STR sequence data by developing the STRSeq BioProject through NCBI [National Center for Biotechnology Information], which is a catalog of known STR sequences based on population data generated here and at several partner laboratories,” she noted.

“Sanger sequencing has been used rarely to investigate unusual results in a research setting, such as when a DNA type did not match up to the ladder.”

In forensic labs, bioinformatics generally provide user friendly means to preserve and access the data. “As far as data management within a forensic laboratory, the sequencing is kit based and the bioinformatic methods are streamlined by the kit vendors, so it is straightforward for users to archive the data,” stated Dr. Gettings. “This is targeted sequencing of a relatively small number of genomic regions, so we’re not generating the quantity of data associated with whole genome sequencing.” NIST also works with companies to educate them on and aid in the application of the dataset to their products. “Our group at NIST works directly with the kit vendors and developers of bioinformatic methods to ensure they are aware of the availability of this data and to assist with implementing it into their software,” Dr. Gettings said.

DNA profiles can serve as an imperative part of forensic analysis to solve crimes, and with projects such as NIST’s Human DNA Standard, the agency has a long history of helping to ensure the accuracy of DNA profiles. Dr. Gettings’s recent work on NGS extends those efforts into the future. “We’re currently working on standardizing the nomenclature for reporting STR sequences,” Dr Gettings told IBO. “[I]t’s important that labs follow a consistent way of reporting these data so that results can be compared across laboratories, and I think standardizing the nomenclature is the key for the vendors to implement the NIST data into their bioinformatic pipelines.”