In a new BMC Genomics paper, scientists from Baylor College of Medicine describe a new method for accurate, affordable interrogation of structural variants across the human genome. We’re delighted to see that automated DNA size selection tools from Sage Science contributed to this important approach.
In the paper, lead authors Min Wang and Christine Beck, along with collaborators from Baylor’s genome center, cite the need for a method like this based on the difficulties of using next-gen sequencing for structural variant analysis. Short-read technologies generally produce sequence data that doesn’t span the variants, making it impossible to align and assemble them accurately. Long-read technology has shown great promise, but has been too expensive for large-scale, genome-wide analyses, the authors note.
So they developed a target-capture approach to enrich for structural variants at particular chromosomal locations. With oligo capture, they target specific insert sizes using the Pippin Prep for fragments up to 1 Kb and BluePippin for anything larger. After library prep is completed, the selected DNA is sequenced on a PacBio instrument. The process is known as the PacBio-LITS (large-insert targeted capture-sequencing) method and is especially noteworthy because it’s the first report of targeted sequencing for libraries with insert sizes greater than 1 Kb.
In this method, size selection is an essential step to the success of the pipeline. “Manual gel-extraction methods involving agarose gel electrophoresis can be used, but we have chosen Sage Science’s Pippin and BluePippin platforms to perform target size selection for improved accuracy and sample recovery,” the authors write, adding that they use “range mode” to preserve DNA complexity from the sample.
The Baylor team presents data from a study of three samples from patients with Potocki–Lupski syndrome. Scientists used PacBio-LITS to analyze structural rearrangements associated with the disease, looking particularly at breakpoint junctions of low-copy repeats (LCR). “We successfully identified previously determined breakpoint junctions … and also were able to discover novel junctions in repetitive sequences, including LCR-mediated breakpoints,” the authors write.
The team posits that beyond structural variation, this new method could also be useful for validating indels and phasing haplotypes.
This week was a busy and educational one for the Sage Science team — we got to attend both the Association of Biomolecular Resource Facilities meeting in St. Louis and the Experimental Biology meeting here in Boston. We had booths at both exhibit halls, and we thank the many scientists who stopped by to learn more about our newest products, SageELF for protein fractionation and the PippinHT high-throughput automated DNA size selection instrument.
ABRF is an event for technology lovers and always gives us a chance to hang out with the savvy scientists who run core labs, vet new instruments, and develop meticulous methods to keep experiments operating smoothly. Standards were in the spotlight this year: in one session, Sarah Munro from the National Institute of Standards and Technology gave a talk; her work for the External RNA Controls Consortium as well as the newer Genome in a Bottle consortium has been very impressive. In another session, members of the ABRF team that performed a valuable study of next-gen and third-gen sequencing platforms presented their findings. If you missed their paper, check it out here. We also enjoyed the talk from Vanderbilt’s Daniel Liebler, who spoke about proteomics and cancer and the need to understand protein interactions. His report that mRNA levels don’t accurately predict protein expression was intriguing, but it was sobering to hear him say that funding for proteomics — a field that will be critical for precision medicine and other clinical advances — has dwindled.
If you weren’t at ABRF, check out the poster we presented there: “The ELF preparative electrophoresis system for size-based proteome fractionation.” It shows data from an E. coli protein extract, using SageELF to automatically separate and collect 12 contiguous size fractions in a short period of time. The SageELF can be used for automated 1D gel fractionation of proteins to increase the sensitivity of peptide detection in complex mixtures; it’s a great alternative to labor-intensive SDS-PAGE gels.
While Sage staffers were living it up in St. Louis, those of us at the Experimental Biology conference were getting a crash course in the latest and greatest in biochemistry. We attended the meeting with particular interest in the American Society for Biochemistry and Molecular Biology (ASBMB), one of the groups represented at the conference.
The award lectures were truly fantastic. Jack Dixon from the University of California, San Diego, spoke about how novel kinases are involved in phosphorylating secreted proteins, and Kathleen Matthews from Rice University, who talked about protein biochemistry, earned appreciation from graduate students with her call for mentoring to improve research success. Some attendees told us that this year’s ASBMB program was one of the best ever. We just wish we’d had more time to absorb all of the great science in the extensive poster hall.
Now it’s back to the office, where we’ll be able to put everything we’ve learned to work!
We couldn’t help noticing that “long reads” kept popping up in presentations and posters at AGBT, and we certainly weren’t alone. Aside from longtime long-read provider Pacific Biosciences and synthetic long-read service Moleculo, acquired by Illumina in 2012, new companies such as 10X Genomics and Dovetail Genomics were touting the value of this kind of information at AGBT.
We’re already seeing sessions on long-read sequencing on the agendas of other upcoming conferences, leading to our theory that 2015 will go down in sequencing history as the Year of Long Reads. It’s no wonder demand for this kind of data is soaring: after years of using short-read sequencers to analyze genomes, scientists are just now realizing how much information about structural variants, haplotype phasing, and other long-range, clinically relevant elements is inaccessible with short reads alone.
There are a couple of different approaches to long-read data. Single-molecule sequencing platforms, like those available through PacBio and Oxford Nanopore Technologies, generate truly long reads on their own. Users of both platforms have presented individual reads running well into tens of kilobases, a far cry from the few hundred bases we’re used to from Illumina and Ion Torrent sequencers. Assembling those long reads can lead to megabase-plus contigs.
But since the vast majority of sequencing data currently available has been produced with short-read technologies, there’s also a huge appetite for bolt-on products that can pull long-range information out of short-read data. Like their older sibling Moleculo, upstarts 10X Genomics and Dovetail Genomics focus on altering library prep in a short-read workflow to allow analytical tools to connect the sequence data into much longer blocks. These synthetic long reads have been shown to elucidate larger elements like structural variants without switching sequencing platforms.
Both approaches suggest an exciting trend that will let us get more out of each genome we sequence. Here at Sage Science, we’re pleased to report that our BluePippin automated DNA size selection platform can be used with either of these approaches to maximize the length of reads generated or synthesized. For an example of how BluePippin works with synthetic reads, check out this blog post; learn more about BluePippin with long-read sequencing in these app notes. And check back soon for new info on how the PippinHT can be used with long-read workflows too!
The Sage team has attended AGBT for years, and the 2015 meeting reminded us just how lucky we are to be part of this amazing community. For those of us who remember the first Marco conference in 2000, it is truly awe-inspiring to see just 15 years later that genomics is being used to treat, and even cure, patients around the world. We were humbled by the rapid and remarkable advances this community has enabled.
Some of our favorite talks this year focused on the human microbiome. Michael Fischbach from the University of California, San Francisco, spoke about naturally occurring molecules produced by the microbes that live in and on us. So many of these natural products are antibiotics that Fischbach joked the organisms had made an end-run around the FDA, finding a way to get these molecules into our systems without regulatory approval or a physician’s prescription. He noted that there’s still a lot to learn about the molecules that our microbes are synthesizing — it seems certain that discovering this information could have a major impact on how we view human health.
Rob Knight from the University of California, San Diego, presented work showing changes in microbiome from infancy onward; the profile evolves until age 2.5, at which point it has matured into the same profile seen in adults. He told attendees that despite the inability of genome-wide association studies to turn up reliably predictive genetic markers of obesity, analyzing the microbiome can reveal whether a person is lean or obese with 90 percent accuracy. Clearly, there’s a lot of uncharted territory in how our microbes are contributing to — or in some cases completely defining — various phenotypes.
There was also strong clinical content at AGBT, with impressive presentations describing how sequencing was used to diagnose patients or to suggest treatment options that are not the standard of care for a given condition. Steve McCarroll from Harvard Medical School gave a talk about how a collection of blood samples for a schizophrenia study led to the unexpected discovery of markers indicating early stages of blood cancer, long before the cancer could be diagnosed with traditional methods.
We can’t review all of the amazing talks and posters here, but suffice it to say, it was really great to witness the innovation, intelligence, and ingenuity driving the genomics community. Many thanks to the scientists who stopped by our suite to learn more about Sage Science, and we’re already looking forward to next year’s AGBT.
Next week is the biggest party of the year for the genomics community: the Advances in Genome Biology and Technology meeting. The Sage Science team can’t wait to emerge from our Boston igloos to soak up some much-needed warmth in Marco Island (while we’re soaking in the great science, of course). Like so many attendees, we’re already dusting off last year’s backpack as we prep for the journey south.
Now in its 16th year, AGBT always manages to deliver a mix of stellar technology talks, brand-new scientific results, and topical information. This year we’re especially looking forward to talks on precision medicine and promising new methods using NGS. We’re also quite intrigued by presentations about genomics in space and city-scale metagenomics.
During the meeting, we’ll be showing off the newest member of our Pippin family, the PippinHT, which is a great fit for the large-scale projects conducted by scientists at this event. PippinHT features everything you love about our platform — fully automated DNA size selection with best-in-class results and reproducibility, without any risk of cross-contamination — now at scale, running up to 24 samples at a time. PippinHT increases throughput while reducing run times and cost per sample.
You can find us in lanai #179, where we’ll be serving up popcorn as well as technical tips on how more accurate DNA sizing can help you generate better results from your NGS pipeline. We hope to see you there!