Cancer Genomics Crowd in Boston for Beyond the Genome Meeting
We’re gearing up for this week’s Beyond the Genome conference, to be held at Harvard Medical School here in Boston. This year’s event, hosted by Genome Medicine and Genome Biology, will focus on cancer genomics, therapies, and bioinformatics. A timely topic during breast cancer awareness month!
The Sage Science team has attended this meeting before, and that’s one of the reasons we’re so excited to be there this year — we know how great the science and speakers will be. The agenda is full of interesting sessions and presentations, including an opening talk from Gaddy Getz on cancer genomics and evolution; Mark Gerstein’s talk on human genome analysis; Andrea Califano speaking about regulatory networks; a talk from Sarah Highlander about the link between cancer and the human microbiome; and Peter Park on structural variation analysis.
We look forward to hearing about the latest advances in applying genomics — particularly next-gen sequencing — to find new ways to understand and defeat cancer. We are proud that so many of our users are deploying Sage products in these projects. From finding indels in paired-end sequencing to tracking structural rearrangements in long-read sequence data, or detecting full gene transcripts to conducting ChIP-seq experiments, Sage customers are truly driving advances in the cancer genomics community.
If you’re attending Beyond the Genome this week, please stop by our table. The Sage team would love to know more about your work and talk about how our products can make your life a little easier.
New Resources: App Notes for Mate-Pair and Long-Read Sequencing with BluePippin
We’ve got some new application notes to share that will be particularly handy for BluePippin customers running mate-pair libraries or sequencing with the Pacific Biosciences platform. Many thanks to our distribution partner, Nippon Genetics, for making this great information available to the community.
In one app note, data provided by Dr. Yoshitoshi Ogura and Dr. Yasuhiro Gotoh from the University of Miyazaki in Japan demonstrate the use of BluePippin in a mate-pair library workflow with Nextera tagmentation. They prepared libraries for six strains of bacteria and used BluePippin to extract 8 Kb fragments. The scientists had previously used manual gel extraction, but found it to be time-consuming and troublesome. They report that BluePippin significantly reduces the amount of time required while delivering high-quality sizing results. Illumina’s mate-pair guidelines already suggest using Pippin Prep for size selection, and we’re glad to see this work validating the use of BluePippin as well.
The other two app notes cover studies conducted to assess the value of BluePippin size selection for achieving longer subreads with the PacBio RS II sequencer. BluePippin has been quite popular with PacBio customers because it can remove short fragments from libraries, focusing sequencing efforts on the longest fragments. This process not only increases average read length, but also boosts instrument throughput.
In one project, Dr. Yasuhito Arai at Japan’s National Cancer Center Research Institute used BluePippin’s high-pass mode to remove fragments smaller than 7 Kb from a library of human genomic DNA. Results were assessed with the Pippin Pulse, our pulsed-field gel electrophoresis product that quickly checks the size of long DNA fragments. According to the study, BluePippin selection offered a real improvement: libraries built without sizing resulted in an average subread length of 2,675 bp; with BluePippin, that average increased to 4,714 bp, an improvement of 76 percent.
For the other project, a scientist from the Okinawa Institute of Advanced Sciences in Japan built three libraries of bacterial DNA: one with no size selection; one selected for fragments 4 Kb and larger; and one selected for fragments 7 Kb and larger. Sequencing was performed using PacBio’s P5-C3 chemistry. Results were checked on both the Pippin Pulse and a Fragment Analyzer from Advanced Analytical. Both evaluations demonstrated that the library made without size selection included a number of short fragments, while the 4 Kb library reduced and the 7Kb library removed short fragments. Compared to the library with no size selection, the 7 Kb library yielded a 3.3-fold increase in average subread lengths (from 2,060 bp to 6,671 bp); the amount of data per cell also increased by 1.9-fold. According to the scientist, BluePippin is effective and essential for obtaining long reads.
Illumina Workflow: Pippin for Massively Parallel Genotyping
With so many Sage customers using their Pippin instruments in an Illumina sequencer pipeline, we’re taking a look at various applications enabled by the Sage + Illumina combination. Today we check out double-digest RADseq, which could not work without precise and reproducible size selection.
The approach was first nailed down by scientists in Hopi Hoekstra’s lab at Harvard University, which focuses on population genetics, development, speciation, and behavioral genetics. Their innovation, a new version of the popular reduced-representation genome sequencing (commonly called RADseq), introduced a second restriction enzyme step as well as Pippin Prep size selection. The result: a validated protocol for massively parallel genotyping that allows researchers to study hundreds or thousands of genetic loci across hundreds of thousands of samples — without any prior knowledge of the organism’s genome.
Essentially, scientists use ddRADseq to study a sliver of the genome in each sample; with Pippin sizing and the double restriction enzymes, they ensure that they’re looking at the same sliver across all samples. Then they can assess genetic variation within those regions for various applications, such as evolutionary development, population studies, and QTL mapping.
We talked to Brant Peterson, PhD, a postdoctoral fellow in the Hoekstra lab and lead author on the ddRADseq paper, to learn more about the work. He told us that the team’s usual method of size selection — manual gel extraction — was simply not reproducible enough to make the ddRADseq results meaningful. After switching to Pippin Prep, Peterson told us, “There’s very little difference from one sizing reaction to the next, which is the key to this approach working.”
In the time since the original paper came out, other labs have adopted the ddRADseq approach. One is GenCore, the genomics sequencing core at New York University’s Center for Genomics and Systems Biology. GenCore Manager Paul Scheid learned the method and offers it as a service for core clients. “We use the Pippin when constructing those ddRAD libraries to control the amount of loci that we hit from a given library,” he told us. “It’s very nice for fine-tuning that parameter.”
Next we’ll have the final post in our blog series. Check back to learn about how Pippin products are being used with Illumina sequencers to generate higher-accuracy assemblies.
NIH Scientists Report New Findings in Battle Against Antibiotic Resistance
Antibiotic resistance is a scary concept, but at least there’s comfort in seeing so many great minds trying to solve the problem. Last week’s announcement that President Obama had issued an executive order for the development of a national plan to battle antibiotic resistance dovetailed nicely with a paper just published in Science Translational Medicine from NIH scientists.
The publication, “Single-molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae,” reports the sequencing of 20 isolates of Enterobacteriaceae resistant to carbapenems, a powerful class of antibiotics used as a last resort in hospitals. Lead author Sean Conlan from NHGRI and his collaborators used the sequence data to understand the transmission path of a Klebsiella pneumoniae outbreak at the NIH Clinical Center in 2011, as well as isolates collected after the outbreak ended.
It’s impressive work, and we’re happy to report that our BluePippin automated DNA size selection platform was used in the project. Sequencing was performed with the PacBio RS II DNA Sequencing System; the team used BluePippin to remove fragments smaller than 5 Kb from the library prior to loading on the sequencer.
Long reads were necessary for the project, the authors note, because short-read sequence data as well as strain-typing technologies were unable to clearly distinguish between the organisms or to fully assemble the genomes.
Conlan et al. report finding less horizontal gene transfer than expected, but having the full sequence — including the drug-resistance-encoding plasmids associated with each genome — enabled them to get a sense of the remarkable diversity of the network of plasmids available to these bacteria.
The team also discovered that most of the cases suspected to represent hospital-acquired infections were in fact acquired earlier and missed in routine screening. This information helped them to focus their infection-prevention efforts on better screening at admission and increasing the frequency of surveillance cultures.
The authors suggest that real-time, whole-genome sequencing is already cost-effective for monitoring drug-resistant bacteria in clinical environments. “The cost of whole-genome sequencing is dwarfed by … costs associated with outbreaks and their investigations, including the human and financial toll and the loss of patient confidence in the health care facility,” they write.
In Nature Biotech Paper, ABRF Group Reports Cross-Platform RNA-seq Findings
If there’s a group you can count on to do the detailed work of putting instruments through their paces to help scientists perform better science, it’s the Association of Biomolecular Resource Facilities. In a new Nature Biotechnology paper, the next-gen sequencing division of ABRF reports the largest known “cross-platform, cross-protocol and cross-site examination of RNA-seq data performed to date.”
As a company working hard to improve the reproducibility and accuracy of a small section of the library prep workflow for next-gen sequencing, we at Sage applaud ABRF for this invaluable resource that will help researchers hone their RNA-seq pipelines. If you haven’t seen the paper yet, check it out: “Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study” from lead authors Sheng Li and Scott Tighe.
The study covered five sequencing platforms — HiSeq from Illumina; PGM and Proton from Life Technologies; the PacBio RS; and the Roche 454 — and was carried out in 15 laboratories. Different RNA-seq protocols were tested as well, such as polyA selected, ribo-depleted, size selected, and degraded RNA. (We were pleased to see that our Pippin Prep was used in the study for the PGM and Proton sequencers.)
The authors, who hail from more than a dozen institutions, make it clear that this effort was not intended to declare a winner among sequencers; rather, the goal was “to establish a useful reference data set for each platform, which will assist laboratories in improving their methods and in evaluating new chemistries, protocols and instruments.” Researchers will be able to use their findings to inform decisions about when and how to compare data sets from different sequencers or different workflows, for example.
The paper reports strong intra-platform consistency as well as inter-platform concordance. “This study found similar RNA-seq results between the various NGS platforms and similar ranges in coefficients of variance across laboratory sites for each platform,” the authors write. “These results indicate that both long- and short-read technologies measure gene expression with similar levels of statistical variation, although they show a tenfold variation for error rates in indels.” In general, however, they caution that deeper sequencing is necessary to capture low-abundance transcripts. They also note that sequencer QV scores from the manufacturers were higher than what they saw empirically, “indicating that a splicing-aware, base quality score recalibration may be needed for RNA-seq, as is already done for DNA-seq with GATK.”
The authors conclude with the hope that their findings will be used to establish best practices for things like isoform characterization and gene quantification. “These and other applications, especially clinical molecular diagnostics that rely on nucleic acid biomarkers, will require a level of technical stability across time and both within and between studies, which this study helps to establish,” they write.