Ms Celine Mercier1, Joane Elleouet1, Loretta Garrett1, Steve A Wakelin1
1Scion, Rotorua, Aotearoa New Zealand
Biography:
Celine Mercier is a bioinformatian specialized in the development of new software for biological problems. She has been a bioinformatician at Scion since 2021. Before Scion, during her PhD at the LECA (Alpine ecology lab) in Grenoble, she worked on methodological developments for environmental DNA data analysis, developing various software for sequence and metadata handling.
Abstract:
The size of microbial genomes can provide important insights into evolutionary and ecological processes influencing both microbial species, and the environments in which they inhabit. The shedding of unnecessary genetic elements and their associated biosynthetic pathways, for example, is a common phenomenon observed in species with high degree of host symbiosis. Genome size can be used as a key trait in microbial communities to provide insights spanning niche size, co-evolution, adaption, and metabolic flexibility of the microbiomes present, but also for the characterisation of the abiotic and biotic environments in which they inhabit. Using the increased prevalence of whole-genome information for all organisms, we have developed an R package, genomesizeR, that allows the inference of genome size, based on taxonomic information and available genome data from the National Center for Biotechnology Information (NCBI). GenomesizeR accepts as input the formats commonly used in environmental DNA analyses. The package offers three different methods for genome size prediction: a Bayesian linear hierarchical model, a frequentist linear mixed-effects model, and a weighted mean method. The applicability of each method varies: the Bayesian method outputs predictions for any taxon that is recognised in the NCBI taxonomy, whereas the other methods are more constrained. GenomesizeR estimates the genome size of each query, with a confidence interval on the estimation. Several plotting functions also provide different ways to visualise the genome size patterns, taking into account sample information.