Bisulfite conversion inherently damages DNA, resulting in patchy coverage of the genome that can be frustrating to treat. The methylation data may not be what you want, as some regions of the genome are underrepresented or even incompletely converted. Bisulfite conversion degrades DNA samples, causing biased coverage and higher sequencing costs.
Let's examine why so many central sequencing labs refuse to perform Bisulfite-seq - even though computationally intensive strategies exist to compensate for the inadequacies of bisulfite-converted libraries. We can therefore see how Enzyme Methyl-seq (EM-seq) circumvents the dangers of bisulfite conversion to produce more accurate, sensitive and biologically relevant results with a variety of DNA methylation analysis techniques as supported by recent scientific work.
Bisulfite conversion stinks and creates methylation blind spots
In theory, the use of sodium bisulfite is an excellent approach to detect DNA methylation. However, to what extent do your results reflect methylomes?live? The short answer is that biased coverage is a big problem.
In principle, it works by converting unmethylated cytosine to uracil, but not by converting methylated (5mC) or hydroxymethylated (5hmC) cytosines. Sodium bisulfite sulfines unmethylated cytosines at low pH, which causes a spontaneous hydrolysis reaction that removes an amino group from cytosine, converting it to uracil, releasing ammonia in the process. (That's clever chemistry, but the ammonia part stinks!) PCR amplification will then convert uracil to thymine, so the unmethylated cytosines are sequenced as thymines and the 5mCs or 5hmCs are analyzed as cytosines. Based on the C to T conversion of unmethylated cytosines, we can accurately identify modified cytosines at single base resolution.
In practice, there are known risks of using bisulfite treatment for methylome sequencing. Sample loss is inherent in the protocol. How this sample loss occurs adds obstacles to scientific success. Bisulfite conversion requires extreme temperatures and pH, which causes depyrimidation of DNA, resulting in degradation. When sequencing adapters are ligated prior to bisulfite conversion, damage is initially observed with lower than expected bisulfite library yields. Unmethylated cytosines are disproportionately destroyed compared to 5mC or 5hmC, resulting in blind spot sequencing and an unbalanced nucleotide composition. This is particularly evident in regions of genomes with high GC content, resulting in biased genomic coverage. Addition of sequence adapters after bisulfite conversion restores library yields to some extent - but biases remain. The real pitfalls of bisulfite conversion chemistry are low yields, uneven genome coverage, and biased GC distribution.
EM-seq ignores DNA bisulfite damage without altering downstream analysis
Researchers can generate more biologically relevant methylome data using enzymatic methyl conversion, which was designed by NEB scientists to bypass the hazards of bisulfite conversion and reduce the number of sequencing reads required, without disrupting downstream sequencing pipelines.
Em EM-seqMT, two sequential enzymatic reactions differentiate cytosine from its methylated and hydroxymethylated forms. The first reaction protects 5mC and 5hmC, but not the cytokines, from deamination by APOBEC in the second reaction. By deaminating only the cytosines - the modified forms can be identified by sequencing C as T and 5mC or 5hmC as C. Critically note how your DNA samples are not exposed to the extreme temperatures and pH required for bisulfite conversion. Minimizing DNA damage in a timely manner greatly enhances the accuracy of methylome data, especially with challenging low input samples. The EM-seq method combines this enzymatic conversion technique with theNEBNext®Ultra IITM library preparationworkflow reagents, with input ranges from 10 to 200 ng. The resulting high-quality EM-seq libraries allow superior detection of 5mC and 5hmC at single-base resolution.
Conveniently, EM-seq sequencing data can be processed using already established data analysis pipelines for bisulfite libraries. It is advisable to sequence bisulfite and EM-seq libraries of cytosines as thymines and 5 mC or 5 hmC as cytosine. Reducing sample damage and the number of required sequencing reads wouldn't be as useful if EM-seq messed with downstream analysis, would it?
EM-seq preserves sample complexity and reduces sequencing costs
The development of next-generation sequencing (NGS) libraries is focused on accurately preserving the complexity of the DNA sample. Longer reads are desirable because they allow greater sequencing depth and can reduce costs. EM-seq outperforms bisulfite-seq in the most important metrics for achieving these goals.
Initially, library yields for EM-seq are greater than for whole-genome bisulfite sequencing (WGBS). This metric consistently translates into fewer duplicates across all input regions. EM-seq libraries are usually larger than WGBS libraries. Typical EM-seq libraries are approximately 370–420 bp, but insert sizes up to ~550 bp can also be achieved. These insert sizes allow for larger sequencing reads, which improves accuracy and lowers sequencing costs.
When both strands of DNA are considered, there are approximately 56 million CpGs in the human genome. Full detection of these CpGs is important. As you can see in the figure above, combined with NEB's highly efficient Ultra II library preparation, EM-seq offers superior detection of CpGs. The minimal coverage of individual CpGs by EM-seq compared to WGBS across multiple inputs is striking. EM-seq detects more CpGs at greater depth than WGBS using the same number of raw reads. This is particularly evident with smaller DNA insertions. EM-seq is superior to WGBS and uses sequencing reads from libraries generated with 10 ng input DNA at 1x coverage depth. WGBS detects 36 million CpGs compared to 54 million for EM-seq. If a more stringent coverage depth of 8x is required, EM-seq detects 11 million CpGs, while WGBS only detects 1.6 million CpGs. This is a big difference in unique CpG coverage.
One of the main reasons for the increase in CpG detection is the intact nature of enzymatically converted DNA, leading to better genome coverage, and this is reflected in the superior uniformity of GC compared to WGBS libraries. You get more sequencing reads for a particular section of a genome, which gives you more confidence in the consensus sequence generated from all the reads. In addition, NEB scientists have optimized EM-seq to detect DNA methylation at single-base resolution from100 pg of DNA. A full technical note for EM-seq is availablehere.
These features of the EM-seq library contribute to the accurate and reproducible analysis of DNA methylation targets. The use of newer technologies such as EM-seq is becoming more essential to get the true biological picture as NGS technology moves from research to clinical use.
EM-seq improves results in many methylation analysis applications
Whole genome sequencing
Whole-genome bisulfite sequencing (WGBS) is widely used to study DNA methylation at single-base resolution, but its accuracy is severely limited by DNA damage. It was fantastic to see recent scientific papers showing how useful EM-seq is for generating this type of data.
Looking at the big picture with whole genomes, it was found that WGBS libraries having adapters ligated before bisulfite treatment had reduced mapping rates and skewed GC content, as well as under-representation of G- and C-containing dinucleotides and over-representation of AA-containing dinucleotides, AT and TA compared to an unconverted genome. This source of bias affecting DNA methylation data was reported byOlova et al. (2018) Genome Biology. Many weaknesses of the original method were overcome by using Post Bisulfite Adapter Tagging (PBAT) libraries, where adapters are introduced after bisulfite conversion to improve library yield and genome coverage. However, fundamental questions of DNA damage associated with bisulfite treatment remain in these post-conversion libraries. There is also another useful method called TET-assisted pyridine borane sequencing (TAPS), which combines enzyme activity (TET1) and a chemical reaction (pyridineborane) to identify DNA methylation. Like EM-seq, TAPS-based methods do not introduce the same DNA damage as bisulfite treatment. TAPS can also be modified to examine various other cytosine modifications. Just keep in mind that TAPS methods require the TET1 enzyme to be made. You also need to switch to new assay pipelines, as TAPS reads the modified cytokines directly.
Interestingly, several research groups have recently reported evidence that EM-seq can outperform WGBS at several levels. Library insert sizes are larger, GC bias plots are normalized compared to standard libraries, and even genome coverage results in more CpGs being identified compared to WGBS. These metrics are part of the reason MorrisonJ, et al.Evaluation of genome-wide DNA methylation sequencing library preparation protocols. (2021) Epigenet. chromatinrecommends EM-seq for genome-wide DNA methylation sequencing based on its data with fresh frozen human fallopian tube tissue samples. The EM-Seq protocol also compared favorably with bisulfite sequencing-based approaches analyzed using high-quality DNA inputs from human cell lines, in Fox and others. The SEQC2 epigenomic quality control (EpiQC) study (2021) Genome Biologywhich provided an evaluation and cross-validation function on platforms for epigenetic research by the FDA Epigenomics Quality Assurance Group. In almost all comparisons, the EM-Seq libraries captured more CpG sites with equal or better coverage. Also,Οι Suhua Feng et al.Efficient and accurate genome-wide determination of DNA methylation patterns in Arabidopsis thaliana by enzymatic methyl sequencing (2020) Epigenetics and Chromatinsuggested that EM-seq is a more accurate and reliable approach than WGBS for DNA methylation detection. reference dataΟι Yan Han et al.Comparison of EM-seq and PBAT methylome library methods for low input DNA (2021) Epigenetics, suggesting that EM-seq performed better overall compared to post-disulfide adapter labeling in quantifying genome-wide methylation from low-throughput samples.
Each of these studies supports EM-seq for whole-genome sequencing. It is particularly interesting to see the variety of sample types used, from high quality cell lines and tissues to low input levels of DNA derived from cerebrospinal fluid. There are so many incredible avenues of epigenomic research.
Longer reads for stepwise genome sequencing
DNA converted to EM-seq has been used for longer amplicon sequencing using Pacific Biosystems (PACBIO®) powered by a single molecule, in real time (SMRT®) sequencing technology. Larger reads are needed for stepwise whole-genome sequencing, which identifies the expression of specific alleles on maternal and paternal chromosomes to study complex genetic traits. Bisulfite-converted DNA is too damaged to be used for longer read technologies. Long read TAPS (lrTAPS) has also been described where the converted DNA has been sequenced using Oxford Nanopore Technologies®and PACBIO®-again with the current lack of a commercialized source of TET1 as a practical caveat.
Reduced Representation Bisulfite Sequencing (RRBS) reduces sequencing costs by focusing on increasing coverage of CpG-rich regions such as promoters or repetitive regions. Genomic DNA is digested using MspI, which recognizes CCGG sequences. The digested DNA is end-repaired, the A-tail and adapters ligate. DNA is treated with bisulfite and then amplified by PCR before sequencing. The disadvantage is that low-representation bisulfite libraries do not cover non-CpG, CpG and CpG regions genome-wide in regions lacking an MspI enzyme restriction site.
Applying EM-seq conversion to a reduced representation method provides wider coverage of MspI-digested regions compared to bisulfite reduced representation because the DNA is more intact after enzymatic conversion.
Cell-free DNA and single cell methylation sequence
The problem of sample loss from bisulfite conversion is worse at lower inputs and at the single-cell scale. EM-seq has the potential for more complete single-cell genomic imaging than is possible using DNA damaged by bisulfite conversion. InErger et al.cfNOMe — Unique assay for comprehensive cell-free DNA epigenetic analyzes (2020) Genome MedicineThe researchers envisioned that targeted enrichment approaches combined with enzymatic cytosine conversion could enable the inexpensive production of datasets with high depths of coverage for even more robust cfDNA methylation studies.
State-of-the-art epigenetics aims for enrichment
Targeted enrichment panels cover methylated genomic regions known to affect gene regulation, as well as differentially methylated regions to enable greater sequencing depth at reduced cost. Enrichment panels designed for use with bisulfite-converted DNA are commercially available from Agilent (SureSelect®XT Methyl-Seq) and Illumina®(TruSeq®EPIC Methyl Capture).
To efficiently improve methyl-seq, the synthetic biologists at Twist Bioscience launched theTwist NGS Methylation Detection Systemto identify methylated regions in the human genome. The end-to-end preparation combines EM-seq conversion technology with Twist's custom methylation capture probes for the most efficient methylation detection available.
Exciting application possibilities with EM-seq
Theoretical DNA methylation microarrays
EM-seq could be replaced with bisulfite-converted DNA for use in methylation microarrays without altering single-stranded DNA probe collections or downstream analysis pipelines. Bisulfite-converted DNA is often used in microarrays to identify methylated cytosines at CpG islands, differentially methylated sites, enhancers, or transcription factor binding sites. TAPS methods cannot be directly replaced without the design of new single-stranded DNA probes because TAPS directly recognizes 5 mC. Here again, switching to enzymatic conversion is a convenient way to improve the quality of your data representation.livemethylation.
Opportunities for 5hmC detection
Variations on the EM-seq and TAPS methods, as well as ACE-seq can be used to detect 5hmC content. These methods are non-destructive and provide an alternative route to gain insight into the role of 5hmCs in gene regulation. Established methods that can interrogate 5mC and 5hmC individually based on bisulfite sequencing still cause DNA damage, as does traditional bisulfite sequencing.
Bisulfite-seq is no bargain
In my opinion, it makes sense that so many central sequencing labs refuse to perform Bisulfite-seq. Your project can investigate the methylation of a single gene, patterns within a genome, or even across multiple genomes. Their research could link methylation of individual cells, tissues or cell-free DNA samples to developmental, cancer or other disease phenotypes. Or maybe you're in a bind, with only a few challenging sample recordings available to study. For a long time, researchers kept their noses up at the opportunity to analyze methylomes because bisulfite conversion was apparently the only option. I like to think of enzymatic conversion as a breath of fresh air.