With so many Sage customers using their Pippin instruments in an Illumina sequencer pipeline, we’re taking a look at various applications enabled by the Sage + Illumina combination. Today we check out double-digest RADseq, which could not work without precise and reproducible size selection.
The approach was first nailed down by scientists in Hopi Hoekstra’s lab at Harvard University, which focuses on population genetics, development, speciation, and behavioral genetics. Their innovation, a new version of the popular reduced-representation genome sequencing (commonly called RADseq), introduced a second restriction enzyme step as well as Pippin Prep size selection. The result: a validated protocol for massively parallel genotyping that allows researchers to study hundreds or thousands of genetic loci across hundreds of thousands of samples — without any prior knowledge of the organism’s genome.
Essentially, scientists use ddRADseq to study a sliver of the genome in each sample; with Pippin sizing and the double restriction enzymes, they ensure that they’re looking at the same sliver across all samples. Then they can assess genetic variation within those regions for various applications, such as evolutionary development, population studies, and QTL mapping.
We talked to Brant Peterson, PhD, a postdoctoral fellow in the Hoekstra lab and lead author on the ddRADseq paper, to learn more about the work. He told us that the team’s usual method of size selection — manual gel extraction — was simply not reproducible enough to make the ddRADseq results meaningful. After switching to Pippin Prep, Peterson told us, “There’s very little difference from one sizing reaction to the next, which is the key to this approach working.”
In the time since the original paper came out, other labs have adopted the ddRADseq approach. One is GenCore, the genomics sequencing core at New York University’s Center for Genomics and Systems Biology. GenCore Manager Paul Scheid learned the method and offers it as a service for core clients. “We use the Pippin when constructing those ddRAD libraries to control the amount of loci that we hit from a given library,” he told us. “It’s very nice for fine-tuning that parameter.”
Next we’ll have the final post in our blog series. Check back to learn about how Pippin products are being used with Illumina sequencers to generate higher-accuracy assemblies.
Antibiotic resistance is a scary concept, but at least there’s comfort in seeing so many great minds trying to solve the problem. Last week’s announcement that President Obama had issued an executive order for the development of a national plan to battle antibiotic resistance dovetailed nicely with a paper just published in Science Translational Medicine from NIH scientists.
The publication, “Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae,” reports the sequencing of 20 isolates of Enterobacteriaceae resistant to carbapenems, a powerful class of antibiotics used as a last resort in hospitals. Lead author Sean Conlan from NHGRI and his collaborators used the sequence data to understand the transmission path of a Klebsiella pneumoniae outbreak at the NIH Clinical Center in 2011, as well as isolates collected after the outbreak ended.
It’s impressive work, and we’re happy to report that our BluePippin automated DNA size selection platform was used in the project. Sequencing was performed with the PacBio RS II DNA Sequencing System; the team used BluePippin to remove fragments smaller than 5 Kb from the library prior to loading on the sequencer.
Long reads were necessary for the project, the authors note, because short-read sequence data as well as strain-typing technologies were unable to clearly distinguish between the organisms or to fully assemble the genomes.
Conlan et al. report finding less horizontal gene transfer than expected, but having the full sequence — including the drug-resistance-encoding plasmids associated with each genome — enabled them to get a sense of the remarkable diversity of the network of plasmids available to these bacteria.
The team also discovered that most of the cases suspected to represent hospital-acquired infections were in fact acquired earlier and missed in routine screening. This information helped them to focus their infection-prevention efforts on better screening at admission and increasing the frequency of surveillance cultures.
The authors suggest that real-time, whole-genome sequencing is already cost-effective for monitoring drug-resistant bacteria in clinical environments. “The cost of whole-genome sequencing is dwarfed by … costs associated with outbreaks and their investigations, including the human and financial toll and the loss of patient confidence in the health care facility,” they write.
If there’s a group you can count on to do the detailed work of putting instruments through their paces to help scientists perform better science, it’s the Association of Biomolecular Resource Facilities. In a new Nature Biotechnology paper, the next-gen sequencing division of ABRF reports the largest known “cross-platform, cross-protocol and cross-site examination of RNA-seq data performed to date.”
As a company working hard to improve the reproducibility and accuracy of a small section of the library prep workflow for next-gen sequencing, we at Sage applaud ABRF for this invaluable resource that will help researchers hone their RNA-seq pipelines. If you haven’t seen the paper yet, check it out: “Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study” from lead authors Sheng Li and Scott Tighe.
The study covered five sequencing platforms — HiSeq from Illumina; PGM and Proton from Life Technologies; the PacBio RS; and the Roche 454 — and was carried out in 15 laboratories. Different RNA-seq protocols were tested as well, such as polyA selected, ribo-depleted, size selected, and degraded RNA. (We were pleased to see that our Pippin Prep was used in the study for the PGM and Proton sequencers.)
The authors, who hail from more than a dozen institutions, make it clear that this effort was not intended to declare a winner among sequencers; rather, the goal was “to establish a useful reference data set for each platform, which will assist laboratories in improving their methods and in evaluating new chemistries, protocols and instruments.” Researchers will be able to use their findings to inform decisions about when and how to compare data sets from different sequencers or different workflows, for example.
The paper reports strong intra-platform consistency as well as inter-platform concordance. “This study found similar RNA-seq results between the various NGS platforms and similar ranges in coefficients of variance across laboratory sites for each platform,” the authors write. “These results indicate that both long- and short-read technologies measure gene expression with similar levels of statistical variation, although they show a tenfold variation for error rates in indels.” In general, however, they caution that deeper sequencing is necessary to capture low-abundance transcripts. They also note that sequencer QV scores from the manufacturers were higher than what they saw empirically, “indicating that a splicing-aware, base quality score recalibration may be needed for RNA-seq, as is already done for DNA-seq with GATK.”
The authors conclude with the hope that their findings will be used to establish best practices for things like isoform characterization and gene quantification. “These and other applications, especially clinical molecular diagnostics that rely on nucleic acid biomarkers, will require a level of technical stability across time and both within and between studies, which this study helps to establish,” they write.
As we continue our blog series on applications that are frequently used with Pippin size selection and Illumina sequencing, we move on to ChIP-seq. One of the most popular capabilities enabled by next-gen sequencing, ChIP-seq (or chromatin immunoprecipitation sequencing) is used to map protein binding sites or analyze protein-DNA interactions across entire genomes.
One of the earliest app notes for Pippin Prep came from Thomas Westerling at the Dana-Farber Cancer Institute. To learn more about the ChIP-seq library prep, check out the app note.
Pippin has also been used with large-scale ChIP projects, such as generating a complete picture of regulatory networks in Mycobacterium tuberculosis. This effort, tackled by James Galagan and his lab at Boston University, entailed methodically performing ChIP-seq on each transcription factor in the microbe to determine which other transcription factors and genomic regions were expressed as a result.
Chris Mawhinney, the scientist in Galagan’s lab who performed the work, told us that enlisting Pippin for these experiments saved time and eliminated the possibility of sample cross-contamination. “If you have too broad a range of sizes, the sequencer software has issues locating the clusters,” she said. “With Pippin Prep, you can get it down to a really nice, narrow size.” Proper sizing also removes adapter-dimers, which otherwise eat into sequencing capacity.
Mawhinney also told us that Pippin speeds up the sample prep routine. “The great thing about Pippin is that you can go right from size selection to PCR; there’s no middle cleanup step, so it’s very convenient,” she said.
We’ll continue our blog series in the coming weeks with posts on ddRADseq, Moleculo, and more. Check back soon!
In this blog series, we’ve been looking at how Sage Science customers use their Pippin Prep and BluePippin instruments with their Illumina sequencers. Today we check out the Nextera workflow, which is designed for speedy NGS sample prep. It can yield even better results with a Pippin size selection step.
We visited Zach Herbert, associate director of the genomics core facility at Dana-Farber Cancer Institute, to learn more about how he built a Nextera+Pippin pipeline. He deploys the Nextera tagmentation protocol for small genomes and larger amplicon projects that come to the core lab. Herbert found that adding a size selection step with Pippin Prep afterward led to very tightly sized libraries. This method is a boon for reproducibility, MiSeq flow cell clustering, and data analysis.
Herbert also uses Pippin when he’s running samples together. Attempting to pool samples with a broad size range in equimolar amounts is very tricky — “but if all those libraries are the same size, then we’re much more likely to get an even distribution of that pool,” he told us.
Pippin and Nextera also work well together for mate-pair sequencing. Illumina recommends using the Pippin platform to get “more stringent” sizing than can be accomplished with AMPure alone. (You can find us under Size Selection in Chapter 3, beginning on page 40 of the guide.)
Check back soon for our next blog in this series. We’ll be looking at using Pippin in the Illumina sequencing pipeline for double-digest RAD-seq, ChIP-seq, and more.