Paper ID: 2403.12072
Floralens: a Deep Learning Model for the Portuguese Native Flora
António Filgueiras, Eduardo R. B. Marques, Luís M. B. Lopes, Miguel Marques, Hugo Silva
Machine-learning techniques, namely deep convolutional neural networks, are pivotal for image-based identification of biological species in many Citizen Science platforms. However, the construction of critically sized and sampled datasets to train the networks and the choice of the network architectures itself remains little documented and, therefore, does not lend itself to be easily replicated. In this paper, we develop a streamlined methodology for building datasets for biological taxa from publicly available research-grade datasets and for deriving models from these datasets using off-the-shelf deep convolutional neural networks such as those provided by Google's AutoML Vision cloud service. Our case study is the Portuguese native flora, anchored in a high-quality dataset, provided by the Sociedade Portuguesa de Bot\^anica, scaled up by adding sampled data from iNaturalist, Pl@ntNet, and Observation.org. We find that with a careful dataset design, off-the-shelf machine-learning cloud services produce accurate models with relatively little effort that rival those provided by state-of-the-art citizen science platforms. The best model we derived, dubbed Floralens, has been integrated into the public website of Project Biolens, where we gather models for other taxa as well. The dataset used to train the model and its namesake is publicly available on Zenodo.
Submitted: Feb 13, 2024