SARS-CoV-2 version 5.3.2 scheme release

scalene · 26 January 2023 15:41

Introduction

Deciding when to release a new primer scheme is always a difficult decision because we want continuity but don’t want genomes with dropouts caused by primer problems. ARTIC v4/4.1 has been the recommended scheme for about 8 months now and as with v3 before it there are now a number of primers which have stopped working for many of the current circulating Omicron sublineages so we think now is the right time to recommend changing to the v5 scheme.

Version 5 has actually been in development for a while and started life as a 3-month MSc project for Chris Kent who subsequently joined the group for his PhD working on methods and algorithms for primer design. It has since been developed in the background and we have already iterated through a number of minor versions as new variants emerged GitHub - quick-lab/SARS-CoV-2: SARs-CoV-2 Primer schemes.

Rebalancing method and figure

Another bit of feedback we commonly get is that the pre-pooled primers available from IDT and Sigma are not well balanced. The reason for this is that v3 and v4 were developed under pressure after the arrival of new variants. Additionally, there is a lead time associated with scaling up manufacturing to thousands of tubes of prepooled primers and this has meant that we have had little time to optimise these formulations besides some crude changes (e.g. 2x, 5x, 10x concentration for weakly amplifying products). For v5.3.2 we wanted this to be different and thanks to the hard work of John Tyson, Anthea Lam and the hard work of the team at BCCDC-PHL in Vancouver this version will be available in a balanced amplicon formulation. As you will see further down this post it’s a level up in refinement compared to previous prepooled ARTIC primers.

Brief note on versioning

Going forward we have decided to use the following scheme for naming primer schemes:

v(w).(x).(y)
w: Major Version. Used to define the grass roots scheme.
x: Minor Version. Used to denote changes e.g. when adding or replacing primers to improve performance.
y: Misc version. Used to include changes to file format or change to primer re-balancing.

Motivation

The performance of v4.1 primers has been undermined by the Omicron sublineages, in addition the number of circulating variants is now larger, the so called ‘variant soup’ which makes the job of designing universal primers more difficult. This is apparent when looking at the dropout rate of different amplicons from UK surveillance sequencing below. The dropout rate of certain amplicons including 1, 21, 45 and 51 has increased over time which obviously results in incomplete genomes, impairs the detection of mutations and affects the robustness of lineage assignment used for monitoring circulating variants. Our aim will always be that the recommended ARTIC scheme is able to generate a complete genome sequence the majority of the time so that is why we recommend moving to v5.3.2 as soon as practical.

Figure 1. Coverage proportion against genome submission date for different amplicons and primer schemes uploaded to CLIMB-COVID April to October 2022. Amplicons showing no change have been excluded for clarity.

Development of v5 scheme

The performance of primer schemes is highly dependent on both the input data and the primer design algorithm. In a previous post, I mentioned that our preference is now to use a genome mask to generate primer schemes. Masking SNPs in the genome with an N forces primalscheme to ignore primers spanning these positions and improves the granularity with which primers can be designed. Likewise, by masking either the deletions or the -1 position of insertions with N will prevent primer placement. In an ideal world you would mask every known mutation ever detected in SARS-CoV-2 but clearly in practice that would cover almost every base in the genome so there is a compromise needed. For v5 Chris developed the N masked input processing and expanded it to encompass; VOCs/VUIs as defined by UKHSA, lineages with 5% or more global prevalence in the GISAID database on 01-05-2021 (v5.0.0) and 15/09/22 (v5.1.0) or lineages with a logistic growth value L > 4 on Nextstrain for a total of 16 lineages. For each lineage 5-9 complete genomes were selected from a range of geographic regions. These were processed into a genome mask using the pipeline NMaskGen GitHub - ChrisgKent/NMaskGen_Snakemake: For the Generation of N masked genomes from clustered sequences.

Outstanding issues

Although this is an improved method it still suffers from problems, an observational bias, caused by the majority of sequences being submitted by a small number of countries with active surveillance while some submit few or none. A possible solution to this is to apply the same filters but on a country or regional level to account for the differences in absolute numbers. The other issue with masked references is that all positions have the same weighting even though some mutations e.g. D614G which is present in nearly every sequence since 2020. In the ideal situation, your algorithm would always find the most conserved primers of the appropriate thermodynamic characteristics but in practice including mutations with a frequency >80% in each clade approximates the lineage defining mutations. All of these mutations are then applied to the mask leaving enough conserved sequence to design a primer scheme without gaps.

Variants considered in the design

Variants included in v5.0.0; AY.103, AY.122, AY.20, AY.25, AY.39.1, AY.39.1.1, AY.4, AY.4.2, AY.4.3, AY.9, B.1.1.318, B.1.1.529, B.1.1.7, B.1.617.1, B.1.351, P.1.
Variants added for v5.1.0; BA.2.3, BA.4.1, BA.5, BA.5.1, BA.5.2, BA.5.2.1, BA.5.6.
Variants added for v5.2.0; BQ.1 (Due to early results showing BQ.1’s growth advantage over existing BA.5 sub-lineages we decided to preemptively include BQ.1)

Tidying up the scheme

For v5.3.2 we went through the _alt primers which had previously been added either manually or by primalscheme repair mode and renamed them _2 in order to simplify and remove non-functional primers. In addition an out-by-one error in repair mode produced a 1 bp gap after primer trimming which needed fixing by replacing 7_LEFT.

Primer name	Sequence
SARS-CoV-2_400_7_LEFT_2	GCTGCTCGTGTTGTACGATCAAT
SARS-CoV-2_400_7_RIGHT_2	CTCCTTAATTTCCTTTGCACAGGTG
SARS-CoV-2_400_52_LEFT_2	TGGTACACTTATGATTGAACGGTTCG
SARS-CoV-2_400_52_RIGHT_2	GTGACATCACAACCTGGAGCATT
SARS-CoV-2_400_62_LEFT_2	TCTATGATGCACAGCCTTGTAGTGA
SARS-CoV-2_400_84_LEFT_2	GTAACAGTTTACTCACACCTTTTGCTC
SARS-CoV-2_400_84_RIGHT_2	TGTTCAACACCARTGTCTGTACTC
SARS-CoV-2_400_88_LEFT_2	GTGGACATCTTCGTATTGCTGGA
SARS-CoV-2_400_88_RIGHT_2	CCATTGGTTGCTCTTCATCTAATTGAG
SARS-CoV-2_400_89_LEFT_2	GATGTTTCATCTCGTTGACTTTCAGG

Table 1. Primers altered or renamed in v5.3.0 as part of the tidy up.

Performance

We have found v5.3.2 to perform extremely well with no systematic dropouts for three current lineages (XBB.1.5 BQ.1.1 and BF.7) and also with lower read requirement due to the improved balancing. The scheme design included BQ.1 but it is able to produce a complete genome for sublineage BQ.1.1, most likely because the additional mutation(s) do not fall within primer binding sites. We have always believed that the 400 bp scheme is better suited for sequencing high Ct or degraded samples but they are inherently more difficult to maintain because there are more primers in the scheme. If we make an informal comparison between ARTIC v5.3.2 (400 bp) and Midnight v2 (1200 bp), a direct comparison is not possible because different cDNA inputs were used, both produce equivalent results below Ct 30 but diverge at Ct 30-35. The point at which ARTIC v5.3.2 performance rolls over has not been reached in this dataset but is higher than that of Midnight v2 which has previously been observed with v4 primers https://www.medrxiv.org/content/10.1101/2021.12.28.21268461v1. These results indicate that, ignoring run mode considerations, Midnight v2 is more suited to routine sequencing and ARTIC v5.3.2 to more challenging applications such as asymptomatic screening or wastewater sampling.

Figure 2. Log amplicon coverage vs genome position for example lineages sequenced using both Midnight v2 and ARTIC 5.3.2 for the same clinical sample extracts.

Figure 3. % Genome coverage (completeness) vs Ct as assessed by qRT-PCR using both Midnight v2 and ARTIC v5.3.2 for the same clinical sample extracts.

Availability

IDT have again agreed to make preformulated primer pools available to buy. They are currently undergoing large scale oligo synthesis and pooling before a pack from each production batch are shipped to Birmingham and BCCDC-PHL for validation. Once we have ensured the performance matches what we see internally they will be released for sale.

Acknowledgements

Thank you to Chris Kent (ARTIC), John Tyson (BCCDC) and Tom Brier (CLIMB-COVID) for generating the data and analysis in this post.

cnyaigoti · 13 April 2023 19:17

Thanks you for sharing this updated set. I am wondering if the updated set is now available through any of the commercial supplier e.g., IDT. Alternatively I will appreciate if you have aliquots to share.

scalene · 14 April 2023 14:57

Yes it should be available from IDT

gquirk96 · 15 May 2023 18:25

Will these primers work to amplify ancestral SARS-CoV-2 sequences as well? Or just more recent Delta/Omicron Variants

scalene · 16 May 2023 09:54

They are designed to work with all strains

gquirk96 · 30 May 2023 19:50

Hello,
Are you able to resolve this issue? Currently, the most recent scheme appears to be v4.1

thanks so much,
Grace

ultraviolet · 23 June 2023 15:01

This is fabulous - we just completed a test run of a new (to us) sequencing setup and used the COVIDSeq Assay with the v4.1 primers - I definitely saw that these primers were great for later Delta variants and Omicron BA.1 and BA.2, but lots of amplicon dropout once you hit BA.2.9 and later.

SkyeGuy8108 · 1 December 2023 18:41

Do we know if there will be an update to account for these dropouts? We have been seeing a lot of dropout and lower quality reads, especially for F456L, for the later omicron variants when run on Illumina.

scalene · 6 February 2024 15:07

It is likely we will revise v5.3.2 by adding some spike-in primers to address this but we haven’t finalised anything yet.

SkyeGuy8108 · 1 March 2024 18:02

Cool. We’d definitely like to get back to some deeper sequencing on Illumina without dropout of the newer variants!

Lauren · 9 May 2024 13:54

Pre-pooled primers available here:

SkyeGuy8108 · 17 June 2024 18:02

I’m not sure what I’m seeing here. It looks like it’s still V5.3.2 for a BA.1 lineage? Did they update it to include JN.1, but keep it named as V5.3.2?

scalene · 18 June 2024 14:51

No this scheme is about 18 month old now and hasn’t been updated for JN.1. We will monitor the performance and decide whether we need to update it again.

Topic		Replies	Views
V3 primer availability Laboratory	0	4157	13 April 2020
nCoV-2019 Version 3 Amplicon Release Laboratory	8	5564	13 April 2020
Pre-designed Primer Schemes	0	82	9 May 2024
Artic minion pipeline with 2 sequencing runs and 2 sets of tiling primers Bioinformatics	1	1001	6 August 2020
Marburg 2023 primer scheme release Laboratory	0	773	16 February 2023

SARS-CoV-2 version 5.3.2 scheme release

Related Topics