SARS-CoV-2 version 4 scheme release

Motivation

ARTIC V3 primers have been the workhorse of SARS-CoV-2 sequencing for 15 months and have under various guises been used in the production of a large proportion of the ~2M genomes that have been deposited in GISAID.

Over time and with the appearance of variants characterised by a large number of mutations, especially in the Spike protein, some of these primers have stopped working. This is more likely to occur when the mutation is a deletion but can also be caused by SNPs with the result being weaker or no amplification for the amplicon affected. Examples of primers affected by mutations include;

72_RIGHT G142D (Delta)

74_LEFT 241/243del (Beta)

76_LEFT K417N (Beta) and K417T (Gamma)

Development of V4

There were already a number of interactions in the V3 identified by Itokawa et al. (Disentangling primer interactions improves SARS-CoV-2 genome sequencing by multiplex tiling PCR) so we wanted to address these at the same time†, however it was hard to reconcile the positions and interactions of the new primers with the existing set. Any replacement primers needed to maintain a continuous tiling path through the genome (i.e. not opening gaps) without the amplicon length being longer than those already in the scheme ~400 bp. This proved to be problematic and after some disappointing results decided it would make more sense to develop a new scheme around VoC/VuI as monitoring these is critically important and collectively they contain a large number of important mutations.

The standard way to run primalscheme is to include representative genomes in an input file in FASTA format‡. In this case an input file was made by generating pseudo genomes (lineage definition mutations only) for complete, representative genomes for the following lineages; B.1.1.7, B.1.351, B.1.429, B.1.525, B.1.617.1, B.1.617.2 and P.1. The scheme was generated using primalscheme v1.3.2 with standard parameters.

Results

Testing this scheme on England/2/2020 strain and clinical samples of Beta and Delta showed very similar patterns of coverage (Figure 1.) in all samples indicating that primers were correctly placed away from variant positions. We identified a number of amplicons which amplified less efficiently than others but increasing the concentration for these primer pairs did not improve the coverage in all cases and three primer pairs had to be completely replaced. Testing showed these sequences could generate even coverage across the genome with low coverage variation (Figure 2).

Figure 1. Rampart coverage plot for strain England/2/2020 strain and clinical samples of Beta and Delta.

Figure 2. Rampart coverage plot for strain England/2/2020 strain final V4 scheme.

The full primer set is available here;

Supply

If you order these primers as individual stocks you can generate your own pools by resuspending them at 100 uM and pooling them with primers for odd regions into pool 1 and primers for even regions into pool 2. Weighted primer concentrations can improve coverage and we will provide updated pooling volumes once we have them. We will of course be working with oligo manufacturers to make pre-pooled V4 primers available for convenience and to avoid many duplicate sets causing a backlog.

Josh Quick
Institute of Microbiology and Infection, University of Birmingham
On behalf of the ARTIC network

†In order to detect this type of heterodimer interaction we needed to make changes to the primalscheme software to detect heterodimer interactions with a 3’ match as opposed to all heterodimer interactions as determined by the thermodynamic simulator built into primer3. This was done using primer3-py bindings to return ASCII structure before filtering them and discarding anything with a positive value for Tm, an overlap of 6-9 bp depending on GC content.

‡Another way suggested by a user, is to use an N masked reference genome as an input file (i.e. using bedtools maskfasta) as primers containing N bases are automatically filtered out.