Paper ID: 2402.15232

Classification of compact radio sources in the Galactic plane with supervised machine learning

S. Riggi, G. Umana, C. Trigilio, C. Bordiu, F. Bufano, A. Ingallinera, F. Cavallaro, Y. Gordon, R. P. Norris, G. Gürkan, P. Leto, C. Buemi, S. Loru, A. M. Hopkins, M. D. Filipović, T. Cecconello

Generation of science-ready data from processed data products is one of the major challenges in next-generation radio continuum surveys with the Square Kilometre Array (SKA) and its precursors, due to the expected data volume and the need to achieve a high degree of automated processing. Source extraction, characterization, and classification are the major stages involved in this process. In this work we focus on the classification of compact radio sources in the Galactic plane using both radio and infrared images as inputs. To this aim, we produced a curated dataset of ~20,000 images of compact sources of different astronomical classes, obtained from past radio and infrared surveys, and novel radio data from pilot surveys carried out with the Australian SKA Pathfinder (ASKAP). Radio spectral index information was also obtained for a subset of the data. We then trained two different classifiers on the produced dataset. The first model uses gradient-boosted decision trees and is trained on a set of pre-computed features derived from the data, which include radio-infrared colour indices and the radio spectral index. The second model is trained directly on multi-channel images, employing convolutional neural networks. Using a completely supervised procedure, we obtained a high classification accuracy (F1-score>90%) for separating Galactic objects from the extragalactic background. Individual class discrimination performances, ranging from 60% to 75%, increased by 10% when adding far-infrared and spectral index information, with extragalactic objects, PNe and HII regions identified with higher accuracies. The implemented tools and trained models were publicly released, and made available to the radioastronomical community for future application on new radio data.

Submitted: Feb 23, 2024