Sage Blog

Double Pippin for Optimized RAD-seq

Scientists in China and the UK recently published an open-access optimized protocol for RAD-seq in the Theoretical and Applied Genetics journal. The method is targeted at large studies of plants and enables users to specify sequence coverage parameters.

From lead author Ning Jiang and collaborators, “A highly robust and optimized sequence-based approach for genetic polymorphism discovery and genotyping in large plant populations” offers a step-by-step protocol. “This optimized approach provides both a computational tool and a library construction protocol, which can maximize the number of genomic sequence reads that uniformly cover a plant genome and minimize the number of sequence reads representing chloroplast DNA and rRNA genes,” the scientists write.

The challenge with using existing RAD-seq protocols for plants, according to the authors, is that chloroplast and rRNA genes can account for the majority of sequence reads in an experiment if scientists don’t adjust for them, making this process inefficient for plant population genotyping.

In the new protocol, the team employed two size selection steps using the Pippin Prep. The workflow looks like this: digestion; ligating barcoded adapters; Pippin Prep sizing; more digestion; PCR amplification; and another size-selection step. (For details, check out this workflow graphic.)

The team validated the method through analysis of six sequencing libraries “for parental lines and their segregating offspring of both diploid and tetraploid Arabidopsis and potato,” they report. They saw balanced sequence representation across the samples. “Sequence data from the optimized RAD-seq experiments shows that the undesirable chloroplast and rRNA contributed sequence reads can be controlled at 3–10 %,” they note.

For pooling, the scientists recommend a maximum of 12 samples per sequencing library to reduce the variation in

number of sequence reads per plant.

Posted in Blog | Tagged , , , , | Comments Off on Double Pippin for Optimized RAD-seq

Podcast: Kari Stefansson on 20 Years of Human Genomics

Mendelspod has turned out another terrific podcast, this one with Kari Stefansson, and we’re proud to have sponsored the thought-provoking discussion.

As most people in the field know, Stefansson earned his fame as founder of DeCode Genetics, which has spent 20 years analyzing the genetics of the Icelandic population. Now part of Amgen, the team continues to churn out publications characterizing genomic variation.

Stefansson spoke with Mendelspod host Theral Timpson about the value of studying an island population, which has a pronounced founder effect that has left many Icelanders with genetic variants that are quite rare in other populations. These variants have been associated with increased risk for Alzheimer’s disease, various forms of cancer, heart attacks, and more. There are also some protective variants, such as a heart-protective gene that has become the focus of a drug discovery program at Amgen.

We were especially interested in Stefansson’s stand on the tug-of-war between societal value of genetic information and each person’s right to privacy. He points out that advances in medicine have come from the generosity of previous patients who shared their medical data, suggesting that keeping this information private may be “antisocial.” Later in the discussion, Stefansson says that he’s been lobbying Icelandic leaders to let him identify everyone with a particularly high-risk BRCA mutation from the national genetic database so these people can be contacted and given treatment options, but opponents argue that this violates a person’s right not to know such information.

Looking forward, Stefansson said that the human brain is “the last frontier in biology” and that we have a long way to go to understand how our brains make us who we are, how they define our species, how they trigger emotions, and more. His team is combining genetic studies with cognitive testing to better understand this organ. Early findings have demonstrated that a variant linked to schizophrenia risk is also associated with creative thinking.

For all that and much more, check out the full podcast!

Posted in Blog | Tagged | Comments Off on Podcast: Kari Stefansson on 20 Years of Human Genomics

Scientists Use Multiple Technologies to Produce High-Quality Chinese Genome Assembly

A newly reported genome assembly of a Chinese individual, generated by scientists in China and the US, used long-read PacBio sequencing, short-read Illumina data, and BioNano Genomics physical maps to achieve remarkably high accuracy and contiguity. Along the way, the team deployed our BluePippin automated DNA size selection platform for both the genome and transcriptome analysis.

From lead author Lingling Shi and many collaborators, the Nature Communications publication reports that long-read data contributed to a more complete picture of the DNA and RNA, allowing the team to find a significant amount of sequence and gene content that had never been observed before. Scientists produced 12.8 Mb of sequence data that did not map to the current human reference genome, and identified many likely functional structural variants that may be specific to the Asian population. The genome assembly also addresses 274 gaps — nearly 30% of existing gaps — in the reference genome, many of which were characterized by simple repeats.

In the transcriptome analysis, the scientists built four libraries with different insert sizes: 1–2 Kb, 2–3 Kb, 3–5 Kb, and greater than 5 Kb. The sequence results were used to predict more than 58,000 isoforms at 30,000 loci, including nearly 60 isoforms “that do not overlap with any GENCODE transcript,” they report. The team used BluePippin for this sizing step (our support department would point out that SageELF would have accomplished this with less hands-on time); check out the supplemental info for details.

This paper continues a promising trend that we’ve noticed in human genome sequencing: the use of multiple orthogonal technologies to produce many dimensions of data for a more comprehensive view of the underlying biology. While it’s more technically challenging upfront, the combo approach really delivers in the analysis. We hope to see many more sequencing projects using this concept to reveal novel information about what makes us tick.

Posted in Blog | Tagged , | Comments Off on Scientists Use Multiple Technologies to Produce High-Quality Chinese Genome Assembly

New GIAB Publication Characterizes Seven Genomes with 12 Technologies

The Genome in a Bottle Consortium is on a roll — and if you haven’t checked out the latest paper in Scientific Data, you’re missing out. “Extensive sequencing of seven human genomes to characterize benchmark reference materials” comes from lead author Justin Zook and senior author Marc Salit, both at the National Institute of Standards and Technology, along with a boatload of collaborators.

In this publication, the GIAB team reports a massive sequencing effort for seven human genomes, five of which are currently or expected to become NIST Reference Materials which will allow sequencing labs around the world to measure the accuracy of their data. Among the genomes included in the publication are “two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry,” the authors write. They note that genomic data was generated with 12 different methods: “BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads.”

The NIST-led team reports that this unprecedented level of detail about each genome has led to diverse data sets that will help inform the reference materials they ultimately make public. “These reference materials are the first of their kind, and will play key roles in the translation of genome sequencing to widespread adoption and as validation tools in clinical practice,” the scientists write. “We previously characterized high-confidence SNP, indel, and homozygous reference genotypes, as well as large deletions and insertions. We plan to use similar methods as well as new methods to characterize these genomes using the data described in this work.”

It was an honor to see that our BluePippin automated size selection platform was used for a number of genomes and with different analysis technologies, including PacBio and SOLiD. We’re glad that our tools contributed to such important work!

Posted in Blog | Tagged | Comments Off on New GIAB Publication Characterizes Seven Genomes with 12 Technologies

NCSU Scientist Incorporates Genomics for Better Blueberry Resources

ashrafiHamid Ashrafi is working to breed higher-quality blueberries that are amenable to mechanical harvest, larger in size, tastier, and have a longer shelf life. As an assistant professor at North Carolina State University, Ashrafi is bringing genomic tools to a long-running blueberry breeding program at the school, integrating the classical breeding with modern breeding.

Blueberries present a real challenge for genome sequencing and assembly: they naturally occur in diploid, tetraploid, and even hexaploid genomes. A draft genome assembly exists, though it isn’t publicly available, and Ashrafi and his colleagues at Kannapolis campus are trying to improve it with new sequencing tools like PacBio, 10x Genomics, Dovetail Genomics, and BioNano Genomics. He is also studying the plant’s transcriptome, which has not been covered extensively before.

Ashrafi relies on core facilities to perform the sequencing, but prefers to handle sample prep in his own lab to reduce the sample preparation turnaround time as well as to train students and postdocs. For size selection, he chose the BluePippin and SageELF automated platforms from Sage Science because they could handle the large fragments needed for long-read sequencing libraries. Recently, he has been using the new 30 Kb protocol for PacBio libraries and has been pooling fractions for Iso-Seq analysis with the SageELF.

The SageELF, which separates an entire sample by size into 12 contiguous fractions, is a good fit for genome and transcriptome sequencing with PacBio. “It reduces the amount of work that you do,” Ashrafi says. “When you make one library, you can fractionate all of it. You can define which fractions you want and combine them, and you only have to run it one time.”

For example, he might split fractions into groups of 10-20 Kb, 20-30 Kb, and 30+ Kb for genome sequencing so the downstream data represents the whole blueberry tissue sample. For Iso-Seq analysis of gene expression, Ashrafi likes to combine fractions into a few bins, which helps boost library yield for deeper sequencing coverage. “Instead of running Iso-Seq for each of the fractions,” Ashrafi says, “you can combine fractions and have enough DNA to run more SMRT Cells.”

Now that he’s become an expert in size selection for long-read sequencing, Ashrafi says his next step is to begin deploying BluePippin for short-read libraries as well.

Posted in Blog | Tagged , , , | Comments Off on NCSU Scientist Incorporates Genomics for Better Blueberry Resources