Minimum coverage required?

The pipeline seems to trim coverage to mean 400x ish in *trimmed.sorted.bam, probably with >artic minion --normalise 200 ?
Is this a minimum for submission to https://gisaid.org?
Have you set minimum coverage for accurate variant calling?

1 Like

From empirical testing, 100x is an adequate coverage level to resolve most variants.

In the pipeline we normalise to 200 (per direction) because we’ve found that more coverage than this slows down the pipeline considerably (particularly the nanopolish version).

However, the vast majority of “simple mutations” can be resolved easily at 20x, which is the default setting in the pipeline. In nanopore sequencing the hardest contexts are SNPs neighbouring or within homopolymers, but there are also other k-mer pairs that can be tricky to resolve. In general, the harder the context, the more evidence you need to call the variant confidently. This is reflected in the QUAL value (which is a sum of log-likelihoods) reported in the VCF. QUAL will typically go up or down with more coverage, and this trajectory is useful in determining if the variant is true. We typically divide this value by depth to give a useful filter (values of <3 are rejected).

To account for any very hard to call mutations coupled with lower coverage we use a masking model to apply Ns to regions that fail the filter, in order to represent uncertainty. This also works well with phylogenetic approaches where missing regions are effectively imputed.

We first introduced this idea in our Ebola paper (https://www.nature.com/articles/nature16996) where there is some more detail about our approach in the Supp. Methods.

Finally, we have drafted a more extensive guide to interpreting nanopolish output which we will post up soon.

2 Likes