Deciding when to release a new primer scheme is always a difficult decision because we want continuity but don’t want genomes with dropouts caused by primer problems. ARTIC v4/4.1 has been the recommended scheme for about 8 months now and as with v3 before it there are now a number of primers which have stopped working for many of the current circulating Omicron sublineages so we think now is the right time to recommend changing to the v5 scheme.
Version 5 has actually been in development for a while and started life as a 3-month MSc project for Chris Kent who subsequently joined the group for his PhD working on methods and algorithms for primer design. It has since been developed in the background and we have already iterated through a number of minor versions as new variants emerged GitHub - quick-lab/SARS-CoV-2: SARs-CoV-2 Primer schemes.
Rebalancing method and figure
Another bit of feedback we commonly get is that the pre-pooled primers available from IDT and Sigma are not well balanced. The reason for this is that v3 and v4 were developed under pressure after the arrival of new variants. Additionally, there is a lead time associated with scaling up manufacturing to thousands of tubes of prepooled primers and this has meant that we have had little time to optimise these formulations besides some crude changes (e.g. 2x, 5x, 10x concentration for weakly amplifying products). For v5.3.2 we wanted this to be different and thanks to the hard work of John Tyson, Anthea Lam and the hard work of the team at BCCDC-PHL in Vancouver this version will be available in a balanced amplicon formulation. As you will see further down this post it’s a level up in refinement compared to previous prepooled ARTIC primers.
Brief note on versioning
Going forward we have decided to use the following scheme for naming primer schemes:
w: Major Version. Used to define the grass roots scheme.
x: Minor Version. Used to denote changes e.g. when adding or replacing primers to improve performance.
y: Misc version. Used to include changes to file format or change to primer re-balancing.
The performance of v4.1 primers has been undermined by the Omicron sublineages, in addition the number of circulating variants is now larger, the so called ‘variant soup’ which makes the job of designing universal primers more difficult. This is apparent when looking at the dropout rate of different amplicons from UK surveillance sequencing below. The dropout rate of certain amplicons including 1, 21, 45 and 51 has increased over time which obviously results in incomplete genomes, impairs the detection of mutations and affects the robustness of lineage assignment used for monitoring circulating variants. Our aim will always be that the recommended ARTIC scheme is able to generate a complete genome sequence the majority of the time so that is why we recommend moving to v5.3.2 as soon as practical.
Figure 1. Coverage proportion against genome submission date for different amplicons and primer schemes uploaded to CLIMB-COVID April to October 2022. Amplicons showing no change have been excluded for clarity.
Development of v5 scheme
The performance of primer schemes is highly dependent on both the input data and the primer design algorithm. In a previous post, I mentioned that our preference is now to use a genome mask to generate primer schemes. Masking SNPs in the genome with an N forces primalscheme to ignore primers spanning these positions and improves the granularity with which primers can be designed. Likewise, by masking either the deletions or the -1 position of insertions with N will prevent primer placement. In an ideal world you would mask every known mutation ever detected in SARS-CoV-2 but clearly in practice that would cover almost every base in the genome so there is a compromise needed. For v5 Chris developed the N masked input processing and expanded it to encompass; VOCs/VUIs as defined by UKHSA, lineages with 5% or more global prevalence in the GISAID database on 01-05-2021 (v5.0.0) and 15/09/22 (v5.1.0) or lineages with a logistic growth value L > 4 on Nextstrain for a total of 16 lineages. For each lineage 5-9 complete genomes were selected from a range of geographic regions. These were processed into a genome mask using the pipeline NMaskGen GitHub - ChrisgKent/NMaskGen_Snakemake: For the Generation of N masked genomes from clustered sequences.
Although this is an improved method it still suffers from problems, an observational bias, caused by the majority of sequences being submitted by a small number of countries with active surveillance while some submit few or none. A possible solution to this is to apply the same filters but on a country or regional level to account for the differences in absolute numbers. The other issue with masked references is that all positions have the same weighting even though some mutations e.g. D614G which is present in nearly every sequence since 2020. In the ideal situation, your algorithm would always find the most conserved primers of the appropriate thermodynamic characteristics but in practice including mutations with a frequency >80% in each clade approximates the lineage defining mutations. All of these mutations are then applied to the mask leaving enough conserved sequence to design a primer scheme without gaps.
Variants considered in the design
Variants included in v5.0.0; AY.103, AY.122, AY.20, AY.25, AY.39.1, AY.39.1.1, AY.4, AY.4.2, AY.4.3, AY.9, B.1.1.318, B.1.1.529, B.1.1.7, B.1.617.1, B.1.351, P.1.
Variants added for v5.1.0; BA.2.3, BA.4.1, BA.5, BA.5.1, BA.5.2, BA.5.2.1, BA.5.6.
Variants added for v5.2.0; BQ.1 (Due to early results showing BQ.1’s growth advantage over existing BA.5 sub-lineages we decided to preemptively include BQ.1)
Tidying up the scheme
For v5.3.2 we went through the _alt primers which had previously been added either manually or by primalscheme repair mode and renamed them _2 in order to simplify and remove non-functional primers. In addition an out-by-one error in repair mode produced a 1 bp gap after primer trimming which needed fixing by replacing 7_LEFT.
Table 1. Primers altered or renamed in v5.3.0 as part of the tidy up.
We have found v5.3.2 to perform extremely well with no systematic dropouts for three current lineages (XBB.1.5 BQ.1.1 and BF.7) and also with lower read requirement due to the improved balancing. The scheme design included BQ.1 but it is able to produce a complete genome for sublineage BQ.1.1, most likely because the additional mutation(s) do not fall within primer binding sites. We have always believed that the 400 bp scheme is better suited for sequencing high Ct or degraded samples but they are inherently more difficult to maintain because there are more primers in the scheme. If we make an informal comparison between ARTIC v5.3.2 (400 bp) and Midnight v2 (1200 bp), a direct comparison is not possible because different cDNA inputs were used, both produce equivalent results below Ct 30 but diverge at Ct 30-35. The point at which ARTIC v5.3.2 performance rolls over has not been reached in this dataset but is higher than that of Midnight v2 which has previously been observed with v4 primers https://www.medrxiv.org/content/10.1101/2021.12.28.21268461v1. These results indicate that, ignoring run mode considerations, Midnight v2 is more suited to routine sequencing and ARTIC v5.3.2 to more challenging applications such as asymptomatic screening or wastewater sampling.
Figure 2. Log amplicon coverage vs genome position for example lineages sequenced using both Midnight v2 and ARTIC 5.3.2 for the same clinical sample extracts.
Figure 3. % Genome coverage (completeness) vs Ct as assessed by qRT-PCR using both Midnight v2 and ARTIC v5.3.2 for the same clinical sample extracts.
IDT have again agreed to make preformulated primer pools available to buy. They are currently undergoing large scale oligo synthesis and pooling before a pack from each production batch are shipped to Birmingham and BCCDC-PHL for validation. Once we have ensured the performance matches what we see internally they will be released for sale.
Thank you to Chris Kent (ARTIC), John Tyson (BCCDC) and Tom Brier (CLIMB-COVID) for generating the data and analysis in this post.