If there’s a group you can count on to do the detailed work of putting instruments through their paces to help scientists perform better science, it’s the Association of Biomolecular Resource Facilities. In a new Nature Biotechnology paper, the next-gen sequencing division of ABRF reports the largest known “cross-platform, cross-protocol and cross-site examination of RNA-seq data performed to date.”
As a company working hard to improve the reproducibility and accuracy of a small section of the library prep workflow for next-gen sequencing, we at Sage applaud ABRF for this invaluable resource that will help researchers hone their RNA-seq pipelines. If you haven’t seen the paper yet, check it out: “Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study” from lead authors Sheng Li and Scott Tighe.
The study covered five sequencing platforms — HiSeq from Illumina; PGM and Proton from Life Technologies; the PacBio RS; and the Roche 454 — and was carried out in 15 laboratories. Different RNA-seq protocols were tested as well, such as polyA selected, ribo-depleted, size selected, and degraded RNA. (We were pleased to see that our Pippin Prep was used in the study for the PGM and Proton sequencers.)
The authors, who hail from more than a dozen institutions, make it clear that this effort was not intended to declare a winner among sequencers; rather, the goal was “to establish a useful reference data set for each platform, which will assist laboratories in improving their methods and in evaluating new chemistries, protocols and instruments.” Researchers will be able to use their findings to inform decisions about when and how to compare data sets from different sequencers or different workflows, for example.
The paper reports strong intra-platform consistency as well as inter-platform concordance. “This study found similar RNA-seq results between the various NGS platforms and similar ranges in coefficients of variance across laboratory sites for each platform,” the authors write. “These results indicate that both long- and short-read technologies measure gene expression with similar levels of statistical variation, although they show a tenfold variation for error rates in indels.” In general, however, they caution that deeper sequencing is necessary to capture low-abundance transcripts. They also note that sequencer QV scores from the manufacturers were higher than what they saw empirically, “indicating that a splicing-aware, base quality score recalibration may be needed for RNA-seq, as is already done for DNA-seq with GATK.”
The authors conclude with the hope that their findings will be used to establish best practices for things like isoform characterization and gene quantification. “These and other applications, especially clinical molecular diagnostics that rely on nucleic acid biomarkers, will require a level of technical stability across time and both within and between studies, which this study helps to establish,” they write.