Sage Blog

Scientists Use Multiple Technologies to Produce High-Quality Chinese Genome Assembly

A newly reported genome assembly of a Chinese individual, generated by scientists in China and the US, used long-read PacBio sequencing, short-read Illumina data, and BioNano Genomics physical maps to achieve remarkably high accuracy and contiguity. Along the way, the team deployed our BluePippin automated DNA size selection platform for both the genome and transcriptome analysis.

From lead author Lingling Shi and many collaborators, the Nature Communications publication reports that long-read data contributed to a more complete picture of the DNA and RNA, allowing the team to find a significant amount of sequence and gene content that had never been observed before. Scientists produced 12.8 Mb of sequence data that did not map to the current human reference genome, and identified many likely functional structural variants that may be specific to the Asian population. The genome assembly also addresses 274 gaps — nearly 30% of existing gaps — in the reference genome, many of which were characterized by simple repeats.

In the transcriptome analysis, the scientists built four libraries with different insert sizes: 1–2 Kb, 2–3 Kb, 3–5 Kb, and greater than 5 Kb. The sequence results were used to predict more than 58,000 isoforms at 30,000 loci, including nearly 60 isoforms “that do not overlap with any GENCODE transcript,” they report. The team used BluePippin for this sizing step (our support department would point out that SageELF would have accomplished this with less hands-on time); check out the supplemental info for details.

This paper continues a promising trend that we’ve noticed in human genome sequencing: the use of multiple orthogonal technologies to produce many dimensions of data for a more comprehensive view of the underlying biology. While it’s more technically challenging upfront, the combo approach really delivers in the analysis. We hope to see many more sequencing projects using this concept to reveal novel information about what makes us tick.

Posted in Blog | Tagged , | Comments Off on Scientists Use Multiple Technologies to Produce High-Quality Chinese Genome Assembly

New GIAB Publication Characterizes Seven Genomes with 12 Technologies

The Genome in a Bottle Consortium is on a roll — and if you haven’t checked out the latest paper in Scientific Data, you’re missing out. “Extensive sequencing of seven human genomes to characterize benchmark reference materials” comes from lead author Justin Zook and senior author Marc Salit, both at the National Institute of Standards and Technology, along with a boatload of collaborators.

In this publication, the GIAB team reports a massive sequencing effort for seven human genomes, five of which are currently or expected to become NIST Reference Materials which will allow sequencing labs around the world to measure the accuracy of their data. Among the genomes included in the publication are “two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry,” the authors write. They note that genomic data was generated with 12 different methods: “BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads.”

The NIST-led team reports that this unprecedented level of detail about each genome has led to diverse data sets that will help inform the reference materials they ultimately make public. “These reference materials are the first of their kind, and will play key roles in the translation of genome sequencing to widespread adoption and as validation tools in clinical practice,” the scientists write. “We previously characterized high-confidence SNP, indel, and homozygous reference genotypes, as well as large deletions and insertions. We plan to use similar methods as well as new methods to characterize these genomes using the data described in this work.”

It was an honor to see that our BluePippin automated size selection platform was used for a number of genomes and with different analysis technologies, including PacBio and SOLiD. We’re glad that our tools contributed to such important work!

Posted in Blog | Tagged | Comments Off on New GIAB Publication Characterizes Seven Genomes with 12 Technologies

NCSU Scientist Incorporates Genomics for Better Blueberry Resources

ashrafiHamid Ashrafi is working to breed higher-quality blueberries that are amenable to mechanical harvest, larger in size, tastier, and have a longer shelf life. As an assistant professor at North Carolina State University, Ashrafi is bringing genomic tools to a long-running blueberry breeding program at the school, integrating the classical breeding with modern breeding.

Blueberries present a real challenge for genome sequencing and assembly: they naturally occur in diploid, tetraploid, and even hexaploid genomes. A draft genome assembly exists, though it isn’t publicly available, and Ashrafi and his colleagues at Kannapolis campus are trying to improve it with new sequencing tools like PacBio, 10x Genomics, Dovetail Genomics, and BioNano Genomics. He is also studying the plant’s transcriptome, which has not been covered extensively before.

Ashrafi relies on core facilities to perform the sequencing, but prefers to handle sample prep in his own lab to reduce the sample preparation turnaround time as well as to train students and postdocs. For size selection, he chose the BluePippin and SageELF automated platforms from Sage Science because they could handle the large fragments needed for long-read sequencing libraries. Recently, he has been using the new 30 Kb protocol for PacBio libraries and has been pooling fractions for Iso-Seq analysis with the SageELF.

The SageELF, which separates an entire sample by size into 12 contiguous fractions, is a good fit for genome and transcriptome sequencing with PacBio. “It reduces the amount of work that you do,” Ashrafi says. “When you make one library, you can fractionate all of it. You can define which fractions you want and combine them, and you only have to run it one time.”

For example, he might split fractions into groups of 10-20 Kb, 20-30 Kb, and 30+ Kb for genome sequencing so the downstream data represents the whole blueberry tissue sample. For Iso-Seq analysis of gene expression, Ashrafi likes to combine fractions into a few bins, which helps boost library yield for deeper sequencing coverage. “Instead of running Iso-Seq for each of the fractions,” Ashrafi says, “you can combine fractions and have enough DNA to run more SMRT Cells.”

Now that he’s become an expert in size selection for long-read sequencing, Ashrafi says his next step is to begin deploying BluePippin for short-read libraries as well.

Posted in Blog | Tagged , , , | Comments Off on NCSU Scientist Incorporates Genomics for Better Blueberry Resources

SageELF Offers Unique Advantages for Long-Range Sequencing and CNV Analysis

We released our SageELF instrument two years ago, and seeing how scientists have adopted it for various NGS pipelines has been a wonderful journey. If you haven’t noticed these great uses, we’ll get you up to speed with this quick recap.

SageELF is a unique size-selection platform for scientists who need something more sophisticated than the traditional options. It takes a DNA sample and separates it by size into 12 contiguous fractions; the high yield makes the instrument an especially nice fit for precious samples. Users can then advance the optimally sized fraction for analysis, or pool multiple fractions for a more customized approach. SageELF can also resolve large DNA thanks to its built-in pulsed-field electrophoresis technology.

One of the first applications we saw was in mate-pair sequencing, driven by experts like Darren Heavens at The Genome Analysis Centre. He led a team that developed a new protocol for generating long mate-pair (LMP) libraries using the SageELF (check it out in Biotechniques or read our blog post). The method saves time and money and decreases the amount of input DNA needed.

“Using the SageELF streamlines the library construction process, allowing LMP libraries >10 kb to be constructed in under 2 days with <10 µg input material,” the TGAC scientists reported. “For many genome projects, multiple insert size LMP libraries are required, and the ability to construct up to 12 discretely sized libraries for a combined reagent cost of $1270 compared with the reagent cost of $715 for a single insert size LMP library highlights the potential cost savings.”

Heavens also came up with a method to analyze copy number variation more reliably with SageELF. His team separates PCR products with the instrument, and then sequences the largest fraction to determine the highest copy numbers present in the sample. “That gives us the true copy number,” he says. “The duplicated genes themselves are so similar that if you don’t have the full-length fragment, they just collapse down in the assembly.”

More recently, we’ve seen adoption of the SageELF among PacBio users working with the Iso-Seq method. The contiguous fractions allow for pooling of samples prior to sequencing, which helps scientists build the ideal library for their full-length isoform studies.

Other labs are just getting started with their SageELF instruments, and we can’t wait to see the creative uses they discover for it!

Posted in Blog | Tagged | Comments Off on SageELF Offers Unique Advantages for Long-Range Sequencing and CNV Analysis

The Conference Month: Prepping for PacBio, ASM, FoG

It’s June, and you know what that means: genomics conference season is back! The Sage team will be attending several events this month and we hope to see you at least once.

We kick off next week with PacBio’s annual East Coast User Group Meeting, held in Baltimore June 8th on the University of Maryland campus. We look forward to this event each year because it’s a great glimpse of the cutting-edge science happening around long-read sequencing. This year, there will be a half-day sample prep workshop before the general meeting, and we couldn’t be more excited if we tried. (Hey, we’re sample prep people. Don’t judge.) PacBio users are doing all sorts of cool things in this area, from lowering input requirements to incorporating our SageELF and pooling size fractions for customized pipelines — it’ll be great to see what they’ve accomplished now. If you’re attending the meeting, be sure to track us down and ask about the Iso-Seq method promo we’re launching at the event.

Next up is the annual meeting of the American Society for Microbiology, taking place June 16-20 in our hometown of Boston and featuring an opening keynote from Bill Gates. When we’re not glued to the stage learning about the growing Zika epidemic, we’ll be camped out in booth #306 in the exhibit hall. Stop by and we’ll be happy to discuss your microbial research and help you consider whether automated DNA size selection would make a difference in your work.

Finally, the month wraps up with the second Boston-based Festival of Genomics, June 27-29. This new series of festivals has been such a great surprise: cool science and a different approach, perhaps most obvious this year from the fact that registration is free for everyone. (Last year it was most obvious from the treadmill placed at the entrance; if you missed it, check out this blog post with our favorite Sage photo ever.) We’ll be in booth #214 in the lab zone, happy to field questions or make suggestions about your DNA sequencing workflow.

Assuming we survive it all, we’re awfully glad that July starts with a holiday!

Posted in Blog | Tagged , , | Comments Off on The Conference Month: Prepping for PacBio, ASM, FoG