Mrs Letizia Lamperti1
1Psl – CEFE – ETH, CEFE Montpellier – ETH Zurich, France
In biodiversity modeling, a key aspect resides in extracting dominant patterns from data, but often we deal with complex data containing multidimensional structure, whose analysis and interpretation are non-trivial. Environmental DNA represents a new method for the study of biodiversity patterns allowing the fast collection of genetic composition in marine and terrestrial ecosystems. Analyses of dimensionality reduction and clustering offer a starting point for extracting the main structure from compositional data. Here we develop two methods based on variational autoencoders (VAEs) for visualizing and analyzing eDNA data. The originality of our approach is related to the possibility of VAE combining different inputs, in our case, the number of sequences found per each molecular operational taxonomic unit (MOTUs), with the genetic sequence information of each MOTU detected. In addition, in the second method, we use pairwise beta diversity as an optimization tool. We demonstrate how the distribution of samples in the reduced spaces of our methods is the representation of the multidimensional beta diversity between samples. We show that the use of Machine Learning models based on non-linearity better extract features from eDNA datasets, bypassing the main biases associated with eDNA and outperforming popular methods of dimension reduction such as Principal Component Analysis, t-distributed Stochastic Neighbor Embedding, and Uniform Manifold Approximation and Projection for dimension reduction. Our results suggest that neural networks can provide more efficient methods to extract structure in eDNA data.
Biography:
I am a mathematical engineer, specialized in applied statistics. Currently I am a PhD Student between CEFE, Montpellier and ETH, Zurich. Member of the group Artificial Intelligence for the Sciences (AI4theSciences), supported and jointly funded by the Horizon 2020-Marie Skłodowska-Curie Actions-COFUND European program.
Harness a combination of machine learning approaches to support the development of a fast data pipeline that transforms eDNA metabarcoding data into ecological indicators for ecosystem monitoring.
I will develop machine learning to improve the identification of the taxonomic composition of eDNA samples and to link eDNA composition to ecological indicators.
The project involves a collaboration between 3 labs (CEFE, EPHE – PSL ; ETH Zurich- WSL Birmensdorf ; TIMC-IMAG, Université Grenoble-Alpes) with complementary skills (ecology, genomics, modelling, artificial intelligence).