PhD subject for the PR[AI]RIE-PSAI fellow call 2025
Link to the PDF version of the subject.
Call details : This PhD subject will be submitted to the PR[AI]RIE-PSAI fellows’ call 2025, which will fund up to 16 projects. Details on the recruitment process, required documents, and deadlines for application are provided after the scientific description.
Supervision
- David Cornu, Fellow AI PSAI & EFELIA (70%)
- Benoit Semelin, professor Sorbonne Université, HDR (10%)
- Gregory Sainton, AI research engineer (20%)
Preferred contact point : david.cornu@obspm.fr
Scientific context and description of the subject
Large astronomical instruments generate an ever-increasing data volume, rapidly approaching the exascale. As a result, we observe an explosion in the use of machine learning methods for astronomical applications, as they are known for their robustness in handling large data volumes. Modern radio astronomy is strongly affected, especially regarding giant radio interferometers. The forthcoming SKA (Square Kilometer Array, Acero et al. 2017) radio-interferometer telescope will revolutionize the field of radio astronomy and associated processing methods (ETA for the first science data ∼2029). SKA’s projected raw data rate is about 1 TB/s, which should generate 700 PB/year of archived data. Therefore, the SKA’s construction must be accompanied by innovative developments regarding hardware infrastructure and astronomical data processing methods. Several current radio telescopes are progressively approaching the performance of the SKA and are thus considered “pathfinders” (LOFAR, ASKAP, MeerKAT, etc.) on which methodological development can be experimented.
Since 2019, the SKA observatory has organized multiple Science Data Challenges (SDC) to stimulate the development of methods applicable to future SKA data. The first two challenges focused on galaxy detection and characterization in simulated 2D radio continuum images (SDC1, Bonaldi et al. 2021) and in simulated 3D (hyperspectral) cubes of HI emission (SDC2, Hartley et al. 2023) that should resemble the typical SKA science data products. The work of the MINERVA project on these SDCs (Cornu et al. 2024) demonstrated that regression-based deep learning detection methods (inspired by YOLO, Redmon et al. (2015) ; Redmon & Farhadi (2016, 2018)) redesigned explicitly for astronomical data can achieve state-of-the-art detection and characterization performances on the SDCs (Fig. 1 and 2) while maintaining best-in-class computing speed (∼130 Mpix/s).
These results are very promising, and we are currently exploring how these methods could be used to create real-time astronomical source detection services to help explore and visualize large astronomical data volumes. In parallel, we are exploring applying these methods to real observed data from SKA precursor instruments to create catalogs with higher purity and completeness than existing ones (mainly by confidently detecting objects at a lower signal-to-noise ratio). The main difficulty is to build a training sample for observational surveys. While simulations (like the SDCs) could be used to train the model, they are usually not realistic enough for the resulting model to perform well on observed data due to the presence of non-simulated effects (mostly observational and processing artifacts). While simulations can still be used for pretraining, complementary training on firmly labeled real images remains necessary.
In this context, the main objective of this PhD subject is to generalize the deep learning regression-based detection method YOLO-CIANNA (Cornu et al. 2024), developed in the context of team MINERVA’s participation in the SKAO SDCs, to data from SKA precursor instruments, including the LoTSS (Shimwell et al. 2022), RACS (McConnell et al. 2020), and MIGHTEE (Hale et al. 2025) surveys. The first task (I) will be to build training catalogs for each of them. For this, the candidate will have to identify the specific properties of each survey in terms of source properties and instrumental effects. Then, he/she will do a comparative study of possible approaches for labeling the data, which include a combination of classical methods, matching with optical and infrared catalogs, citizen science, observational confirmation of source candidates, and all possible combinations of these approaches. The second task (II) will be to optimize the detector (architecture and hyperparameters) for the different surveys and analyze the results. The produced catalogs are expected to contain several millions of galaxies, which the candidate will use to conduct statistical studies like constraining the properties of specific populations (e.g., active galaxies) or evaluating the impact of the environment (e.g., galaxy clusters) on star formation. The candidate will then conduct an exploratory study on generalizing the approach to preliminary observational SKA data that will likely be available inside the collaboration before the end of the PhD.
The third task (III) will be to identify limits specific to the detector or the nature of astronomical data to motivate deeper modifications of the detection method itself. In contrast with tasks I and II, which are mostly sequential, this third objective will be explored in parallel. While we want to keep expressing the detection as a pure regression task, some limitations of the YOLO-CIANNA method could be lifted, like the fixed number of objects in the detection grid, by adding a few transformer layers like in Carion et al. (2020). Regarding the higher-level organization of the detector, we could use self-supervised pre-training directly on the observational data (e.g., using a DDPM) or try to create more generic (instrument or survey agnostic) source detectors by either providing meta-data as an additional modality or by using a diffusion auto-encoder formalism (Preechakul et al. 2022). The candidate will explore these possibilities and/or other structural changes based on specific limitations observed in the context of astronomical source detection. By doing so, the specificities of astronomical data will be used as a vector of technical and methodological innovation for general-purpose machine learning and AI.


— -
Références
- Acero, F., Acquaviva, J. T., Adam, R., et al. 2017, arXiv e-prints, arXiv:1712.06950
- Bonaldi, A., An, T., Brüggen, M., et al. 2021, MNRAS, 500, 3821
- Carion, N., Massa, F., Synnaeve, G., et al. 2020, arXiv e-prints, arXiv:2005.12872
- Cornu, D., Salomé, P., Semelin, B., et al. 2024, A&A, 690, A211
- Hale, C. L., Heywood, I., Jarvis, M. J., et al. 2025, MNRAS, 536, 2187
- Hartley, P., Bonaldi, A., Braun, R., et al. 2023, MNRAS, 523, 1967
- McConnell, D., Hale, C. L., Lenc, E., et al. 2020, Publications of the Astron. Soc. of Australia, 37, e048
- Preechakul, K., Chatthee, N., Wizadwongsa, S., & Suwajanakorn, S. 2022, in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 10619–10629
- Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. 2015, arXiv e-prints, arXiv:1506.02640
- Redmon, J. & Farhadi, A. 2016, arXiv e-prints, arXiv:1612.08242
- Redmon, J. & Farhadi, A. 2018, arXiv e-prints, arXiv:1804.02767
- Shimwell, T. W., Hardcastle, M. J., Tasse, C., et al. 2022, A&A, 659, A1
Recruitment process
Non-discrimination, openness, and transparency : The supervisors of the subject and all PR[AI]RIE-PSAI partners are committed to supporting and promoting equality, diversity, and inclusion within our communities. We encourage applications from various profiles, which we will select through an open and transparent recruitment process, which is detailed below.
Academic background and skill requirements : The candidates are expected to have or be in the process of completing a Master’s degree in either astrophysics or applied computer science with practical experience in astronomy. They must be proficient in Python and C programming. Experience in HPC / parallel computing will be appreciated. Theoretical and practical knowledge of using Artificial Neural Networks is strongly recommended. Candidates fulfilling these requirements will then be evaluated based on their academic excellence.
Documents required for application : We ask the candidates to provide a letter of application/motivation, a detailed CV, and an academic transcript from at least their M1 and first semester of M2. The candidates should accommodate at least two recommendation letters from previous internship supervisors. For the final application to PSAI, the candidate will need to provide a copy of their latest diploma. All the documents must be sent by email to david.cornu@obspm.fr with the title “PSAI PhD application [Name Surname]”.
Application timeline : The deadline for applying is May 15. The supervisors will evaluate applications as they arise and propose auditions to the candidates who present an appropriate profile. No audition will be proposed after May 21, and non-auditioned candidates will be informed that they have not been selected. The final candidate will be selected before May 30, non-selected auditioned candidates will be informed, and the subject-candidate pair will be submitted to PR[AI]RIE-PSAI. The final decision on funding the PhD will be made by the PR[AI]RIE-PSAI selection committee by June 15. If funded, the PhD project will start sometime between September 1 and November 30, 2025, after agreement between the supervisors and the selected candidate.
Location : The successful candidate will be affiliated with the Paris Observatory and will join the COSGAL team of the LUX. He/she will register at the Doctoral School “Astronomy and Astrophysics for Paris Area” (ED 127-AAIF).