Visualising gaps, overlaps and anomalies in taxonomic reference data for metabarcoding

Dr Annette McGrath1, Dr Xin-Yi Chua1,2, Prof David Lovell2

1CSIRO, Brisbane, Australia, 2QUT, Brisbane, Australia

 

Taxonomic reference databases relate genomic sequences to specific taxa and play a fundamental role in sequence-based biodiversity studies. Metabarcoding is a strategy for analysing DNA from a mixture of organisms to ascertain which taxa are present. The success of a metabarcoding study depends on many factors including the presence of data from species of interest in reference sequence databases and the ability of the primer set chosen to distinguish between species accurately.

The concept of a “DNA barcoding gap” has been used to characterise intra- and interspecific barcode sequence variation. We extend that concept to DNA metabarcoding and propose a practical strategy to visualise and identify gaps, overlaps and anomalies in taxonomic reference data. The use of different primer pairs on the same taxa can exhibit different metabarcoding gap characteristics. By computing pairwise global alignments at scale, we demonstrate the usefulness of the DNA metabarcoding gap through examples that highlight (i) species easily distinguished from others by the use of particular primer pairs (ii) identifying reference sequences that are mis-annotated, and (iii) species predisposed to being inadequately distinguished during the taxonomic assignment processes used in many bioinformatics workflows. We propose a simple visualisation to readily identify these scenarios that can be used in designing a metabarcoding experiment to achieve the desired outcome.


Biography:

Biography to come