Paper ID: 2211.10307

SeaTurtleID2022: A long-span dataset for reliable sea turtle re-identification

Lukáš Adam, Vojtěch Čermák, Kostas Papafitsoros, Lukáš Picek

This paper introduces the first public large-scale, long-span dataset with sea turtle photographs captured in the wild -- \href{https://www.kaggle.com/datasets/wildlifedatasets/seaturtleid2022}{SeaTurtleID2022}. The dataset contains 8729 photographs of 438 unique individuals collected within 13 years, making it the longest-spanned dataset for animal re-identification. All photographs include various annotations, e.g., identity, encounter timestamp, and body parts segmentation masks. Instead of standard "random" splits, the dataset allows for two realistic and ecologically motivated splits: (i) a \textit{time-aware closed-set} with training, validation, and test data from different days/years, and (ii) a \textit{time-aware open-set} with new unknown individuals in test and validation sets. We show that time-aware splits are essential for benchmarking re-identification methods, as random splits lead to performance overestimation. Furthermore, a baseline instance segmentation and re-identification performance over various body parts is provided. Finally, an end-to-end system for sea turtle re-identification is proposed and evaluated. The proposed system based on Hybrid Task Cascade for head instance segmentation and ArcFace-trained feature-extractor achieved an accuracy of 86.8\%.

Submitted: Nov 18, 2022