A Beginner's Guide to ARTIC

What is ARTIC?

ARTIC comprises a collection of open-access resources, both for laboratory work and data processing and analysis, to enable real-time molecular epidemiology for pathogen surveillance and outbreak response. Crucially, the protocols have been designed to be performed on minimal, portable equipment, facilitating their use at or near the frontline of outbreaks, where sophisticated laboratory resources may not be available.

What is molecular epidemiology?

While traditional epidemiology relies on studying outbreaks from case data, often examined after-the-fact, molecular epidemiology can monitor outbreaks in near real-time. Unbiased sequencing can help identify novel pathogens/strains at the start of an outbreak, when more standard methods - which usually target a specific, known agent - may fail. Sequencing allows for much more accurate transmission chain tracking than traditional contact tracing methods, by monitoring the spread of specific genotypes between individuals. The origins of different localised outbreaks can also be identified, determining whether they have common or independent sources. Thus molecular epidemiology, utilising modern sequencing technologies, has the potential to provide detailed, actionable data to assist in infectious disease outbreak monitoring and interventions. For in-depth reviews, see: Gardy & Loman (2018), Grubaugh et al. (2019). We also recommend “An applied genomic epidemiological handbook” (Black & Dudas, 2023) for an excellent overview of genomic epidemiology for public health.

Who is ARTIC for?

The ARTIC protocols are for researchers, front-line scientists, and anyone interested in infectious disease surveillance - detection and characterisation, outbreak tracking, as well as virus evolution and phylogenetics.

The ARTIC workflow

The ARTIC workflow provides a variety of resources, from primer design through to data analysis, for both targeted and metagenomic sequencing. Sequencing can be performed on your desired platform. Short amplicons (≤400bp) can be sequenced directly on both Illumina and Oxford Nanopore (ONT) devices. Longer amplicons must be fragmented prior to sequencing if you wish to use Illumina. If you wish to preserve longer amplicons/strands, ONT should be used.

The LoCost workflow (discussed in more depth separately) provides laboratory and bioinformatics protocols for whole genome amplicon sequencing using the ONT MinION. It has been optimised for time and cost efficiency, and to be suitable for field-deployment. The protocol is for sequencing nCoV-19, but may serve as a template for other viruses if you wish to perform targeted sequencing using ONT.

The resources described below are for targeted (amplicon) sequencing. Metagenomic sequencing will be discussed in a separately.


Fig 1: Overview of ARTIC resources

Primer design

PrimalScheme is an online tool which will design a set of primers to amplify (nearly) the whole genome of your target organism. All that is required is at least one full reference genome, although multiple genomes will increase the likelihood that the primers will amplify across strains. The desired amplicon length can be specified; smaller amplicons are recommended where it is expected that the quality of DNA/RNA will be low. Several primer schemes have already been designed, for viruses such SARs-Cov-2, Ebola, and measles, and are free to download. The full catalogue is available here. The list also indicates the current status of the primer scheme - validated, tested, or draft.

The scheme divides the primer pairs into multiple pools, to ensure no overlap in the PCR pools. The sequence of the primer sites of the amplicons will be the primer sequence, potentially masking genomic variation. Therefore, primer sites are trimmed from the read data, with the overlap between amplicons providing the “true” genomic sequence. For example, in the schematic below you can see that the true sequence for the primer binding sites of amplicon 2 will be obtained by amplicons 1 and 3.


Fig 2: Primer scheme example

Primer preparation

We recommend resuspending lyophilised oligos and combining into pools prior to beginning processing of samples, as schemes can contain several hundred primers, and as such this step can take a significant amount of time. Each primer scheme consists of multiple pools of non-overlapping amplicons, to avoid unwanted short-product amplicons being generated, and to ensure (near) complete genome coverage. The primer pools can be stored frozen, and thawed when needed. We recommend freezing multiple aliquots in case of degradation or contamination, and to avoid repeated freeze-thawing of the stocks. See the LoCost protocol for full details on primer dilution and pooling. Once this has been completed you will have sufficient stocks for processing several hundred samples, and this step need not be repeated until you run out of primers.

NB : the ARTIC SARs-CoV-2 primer scheme (v5.3.2) can be purchased pre-pooled here.


Fig 3: Overview of the ARTIC laboratory workflow

cDNA synthesis

This is a very simple step which only requires adding the reverse-transcription mix (LunaScript) to your samples (one tube for each sample). If your sample is DNA (rather than RNA) this step should be skipped. To avoid contamination of reagents, it is recommended that the LunaScript mix be added to the reaction tubes in a ‘mastermix’ cabinet, and then the samples added to the tubes in a separate ‘sample addition’ cabinet.

PCR Genome Amplification

As described above, the PCR primers are divided into two pools. As such, each sample will undergo two PCR reactions, so you will have two tubes per sample. These will be combined again later. To avoid contamination of reagents, it is recommended that the PCR mix be added to the reaction tubes in a ‘mastermix’ cabinet, and then the DNA samples added to the tubes in a separate ‘sample addition’ cabinet. After PCR it is vital that these samples are not returned to either of these cabinets, as PCR amplicons will easily contaminate pre-PCR areas. Post-PCR samples should be handled in a completely separate area.

One-step vs two-step cDNA synthesis

The method described above is two-step synthesis: cDNA is generated from the RNA sample using random hexamers, and then targeted PCR is performed on the cDNA. However, it is also possible to combine these steps into one, using your targeted primers together with the RT mix, rather than random hexamers. It is important to note that the RT mix you purchase will be different in each case, so make sure you buy the correct mix for your workflow.

Workflow directionality

It is crucial that the sample processing be performed in separate cabinets as described above, and that once samples/reagents have moved from one step to the next, they do not move back. Doing so poses a significant risk of contamination, and once contamination has occurred it is extremely difficult to remove.

Workflow Directionality
Fig 4: Workflow Directionality

Data processing

Performing quality checks on your data is crucial. These steps not only examine read quality, but also perform other checks such as identifying possible contamination and sample swaps. These will be discussed in more detail in a separate article.

RAMPART

RAMPART is a piece of command-line software which provides in-depth analysis of your sequencing run, in real-time. It is for use with ONT sequencing, and operates concurrently with MinKNOW, which must be set to real-time basecalling. As well as data on read length, depth, and coverage, it will also provide real-time phylogenetic analysis of your sample(s). Use of RAMPART requires a reasonable level of experience with command-line, and so is an optional element of the ARTIC work-flow. The required documentation is available here.

Field bioinformatics pipeline

The field bioinformatics pipeline is the primary data processing for ARTIC. It is currently for processing ONT sequencing data, however multi-platform pipelines are in development. Specific protocols are provided for nCoV-19 and Ebola, but these can be adapted for other viruses. Reads need to have been demultiplexed and ‘polished’ prior to running the pipeline. The pipeline will process one barcode at a time (though this can be automated to work through all barcodes). The reads are aligned to the reference genome, producing a consensus sequence of your sample (FASTA file), a list of detected variants (VCF file), and a BAM file for visualisation.

Further data analyses

NB: The programmes given below are not intended to be an exhaustive list, but rather some useful tools to start you going.

BEAST (Bayesian Evolutionary Analysis Sampling Trees) creates rooted, time-measured phylogenies using molecular clock models. More information and software downloads can be found here.

CIVET (Cluster Investigation and Virus Epidemiology Tool) is another piece of command-line software. As the name suggests, it helps investigate virus clusters, providing real-time epidemiological data for outbreak response and management. It analyses genomic data and associated metadata, putting your sequences in the context of others, providing information on clusters, as well as phylogenetic information, to monitor spread and evolution. Documentation and examples of reports can be found here.

Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages) will assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. It is available as a command line tool and a web application. Documentation and tutorials can be found here.

UShER (Ultrafast Sample placement on Existing tRees) is another command line phylogenetic placement tool, which can be downloaded here. While it has been used extensively for SARS-CoV-2 phylogenetic analyses, it is not restricted to SARS-CoV-2.

Nextclade is an online tool (no downloads or command line). It performs genetic sequence alignment, clade assignment, mutation calling, phylogenetic placement, and quality checks for SARS-CoV-2, Influenza (Flu), Mpox (Monkeypox), and Respiratory Syncytial Virus (RSV), using pre-assembled reference sequence sets. It can be used here.

Chapter 7 of “An applied genomic epidemiological handbook” provides a more in-depth overview of UShER and Nextclade, along with other phylogenetic tools, and also a useful discussion on the pros and cons of using phylogenetic trees vs phylogenetic placement tools.

Using ARTIC in the field

While these protocols and resources can be used in standard laboratories, the real goal of ARTIC is to be field-deployable, to allow real-time sequencing at the front-lines of outbreaks, in regions where laboratory resources may be limited. For use in the field we recommend use of the ONT MinION, a sequencer the size of a large USB stick which can be run on a laptop. All of the above bioinformatics can be performed on a laptop with sufficient specifications (see the ONT website for details on the current minimum requirements to run MinKNOW and Guppy). It is important to note that time-efficient basecalling required an NVIDIA Cuda-compatible GPU (check the ONT website for the specific requirements), and these are not available in Macs, so you will require either a Linux or Windows machine. You will also need a sufficiently large SSD hard drive to handle the sequencing data. If you anticipate a large number of sequencing runs, processed data can be moved to an external SSD for long-term storage, freeing up space on the internal SSD for the next experiment. We strongly recommend installing and testing all required software before deployment in the field, so that it can be performed offline. Field sites often have limited (or no) internet access, and so troubleshooting may be difficult or not even possible.

3 Likes