M2 MIND - MEDS - Projet de recherche

Attention Ceci est une très mauvaise bibliographie (ça fait partie de l’exercice):

Ne faites pas comme ça.

Attaques adverses

Chen, Meng, Jiawei Tu, Chao Qi, et al. 2025. “Towards Physically Realizable Adversarial Attacks in Embodied Vision Navigation.” arXiv:2409.10071. Version 5. Preprint, arXiv, August 15. https://doi.org/10.48550/arXiv.2409.10071.

Cools, Kasper, Clara Maathuis, Alexander M. van Oers, et al. 2025. “Vision Transformers: The Threat of Realistic Adversarial Patches.” arXiv:2509.21084. Preprint, arXiv, September 25. https://doi.org/10.48550/arXiv.2509.21084.

Goldblum, Micah, Dimitris Tsipras, Chulin Xie, et al. 2021. “Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses.” arXiv:2012.10544. Preprint, arXiv, March 31. https://doi.org/10.48550/arXiv.2012.10544.

Gu, Jindong, Xiaojun Jia, Pau de Jorge, et al. 2024. “A Survey on Transferability of Adversarial Examples across Deep Neural Networks.” arXiv:2310.17626. Preprint, arXiv, May 2. https://doi.org/10.48550/arXiv.2310.17626.

Laugros, Alfred, Alice Caplier, and Matthieu Ospici. 2021. “Using Synthetic Corruptions to Measure Robustness to Natural Distribution Shifts.” arXiv:2107.12052. Preprint, arXiv, November 18. https://doi.org/10.48550/arXiv.2107.12052.

Li, Yiquan, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Bo Li, and Chaowei Xiao. 2024. “Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness.” arXiv:2407.00623. Version 1. Preprint, arXiv, June 30. https://doi.org/10.48550/arXiv.2407.00623.

Lu, Liming, Shuchao Pang, Siyuan Liang, et al. 2025. “Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks.” arXiv:2503.04833. Preprint, arXiv, March 18. https://doi.org/10.48550/arXiv.2503.04833.

Lyu, Saiyue, Shadab Shaikh, Frederick Shpilevskiy, Evan Shelhamer, and Mathias Lécuyer. 2025. “Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences.” arXiv:2406.10427. Version 3. Preprint, arXiv, July 10. https://doi.org/10.48550/arXiv.2406.10427.

Mahmood, Kaleel, Rigel Mahmood, and Marten van Dijk. 2021. “On the Robustness of Vision Transformers to Adversarial Examples.” arXiv:2104.02610. Preprint, arXiv, June 5. https://doi.org/10.48550/arXiv.2104.02610.

Wang, Jiakai, Xianglong Liu, Jin Hu, et al. 2024. “Adversarial Examples in the Physical World: A Survey.” arXiv:2311.01473. Version 2. Preprint, arXiv, July 19. https://doi.org/10.48550/arXiv.2311.01473.

Contrôle robotique

Akki, Shivayogi, and Tan Chen. 2025. “Benchmarking Model Predictive Control and Reinforcement Learning Based Control for Legged Robot Locomotion in MuJoCo Simulation.” arXiv:2501.16590. Preprint, arXiv, January 28. https://doi.org/10.48550/arXiv.2501.16590.

Brohan, Anthony, Noah Brown, Justice Carbajal, et al. 2023a. “RT-1: Robotics Transformer for Real-World Control at Scale.” arXiv:2212.06817. Preprint, arXiv, August 11. https://doi.org/10.48550/arXiv.2212.06817.

Brohan, Anthony, Noah Brown, Justice Carbajal, et al. 2023b. “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.” arXiv:2307.15818. Preprint, arXiv, July 28. https://doi.org/10.48550/arXiv.2307.15818.

Burchi, Maxime, and Radu Timofte. 2024. “MuDreamer: Learning Predictive World Models without Reconstruction.” arXiv:2405.15083. Preprint, arXiv, May 23. https://doi.org/10.48550/arXiv.2405.15083.

Chittepu, Yaswanth, Blossom Metevier, Will Schwarzer, Austin Hoag, Scott Niekum, and Philip S. Thomas. 2025. “Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints.” arXiv:2506.08266. Version 1. Preprint, arXiv, June 9. https://doi.org/10.48550/arXiv.2506.08266.

Gajewski, Paul, Dominik Żurek, Marcin Pietroń, and Kamil Faber. 2024. “Solving Multi-Goal Robotic Tasks with Decision Transformer.” arXiv:2410.06347. Preprint, arXiv, October 8. https://doi.org/10.48550/arXiv.2410.06347.

Li, Zezeng, Alexandre Chapin, Enda Xiang, et al. 2025. “Robotic Manipulation via Imitation Learning: Taxonomy, Evolution, Benchmark, and Challenges.” arXiv:2508.17449. Version 1. Preprint, arXiv, August 24. https://doi.org/10.48550/arXiv.2508.17449.

Morad, Steven, Ajay Shankar, Jan Blumenkamp, and Amanda Prorok. 2024. “Language-Conditioned Offline RL for Multi-Robot Navigation.” arXiv:2407.20164. Preprint, arXiv, July 29. https://doi.org/10.48550/arXiv.2407.20164.

Nair, Suraj, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, and Abhinav Gupta. 2022. “R3M: A Universal Visual Representation for Robot Manipulation.” arXiv:2203.12601. Preprint, arXiv, November 18. https://doi.org/10.48550/arXiv.2203.12601.

Zakka, Kevin, Baruch Tabanpour, Qiayuan Liao, et al. 2025. “MuJoCo Playground.” arXiv:2502.08844. Version 1. Preprint, arXiv, February 12. https://doi.org/10.48550/arXiv.2502.08844.

Efficacité

Chen, Lei, Yuan Meng, Chen Tang, et al. 2024. “Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers.” arXiv:2406.17343. Version 1. Preprint, arXiv, June 25. https://doi.org/10.48550/arXiv.2406.17343.

Fladmark, Eirik, Muhammad Hamza Sajjad, and Laura Brinkholm Justesen. 2023. “Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis.” arXiv:2303.15479. Preprint, arXiv, March 26. https://doi.org/10.48550/arXiv.2303.15479.

Gu, Yuxian, Li Dong, Furu Wei, and Minlie Huang. 2025. “MiniLLM: Knowledge Distillation of Large Language Models.” arXiv:2306.08543. Preprint, arXiv, November 21. https://doi.org/10.48550/arXiv.2306.08543.

Huang, Xijie, Zhiqiang Shen, Pingcheng Dong, and Kwang-Ting Cheng. 2024. “Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision.” arXiv:2307.00331. Preprint, arXiv, October 12. https://doi.org/10.48550/arXiv.2307.00331.

Jayanth, Rakshith, Neelesh Gupta, and Viktor Prasanna. 2024. “Benchmarking Edge AI Platforms for High-Performance ML Inference.” arXiv:2409.14803. Version 1. Preprint, arXiv, September 23. https://doi.org/10.48550/arXiv.2409.14803.

Liang, Jessica, and Anirudh Bharadwaj. 2025. “QR-LoRA: QR-Based Low-Rank Adaptation for Efficient Fine-Tuning of Large Language Models.” arXiv:2508.21810. Preprint, arXiv, August 29. https://doi.org/10.48550/arXiv.2508.21810.

Mansourian, Amir M., Rozhan Ahmadi, Masoud Ghafouri, et al. 2025. “A Comprehensive Survey on Knowledge Distillation.” arXiv:2503.12067. Preprint, arXiv, October 11. https://doi.org/10.48550/arXiv.2503.12067.

Rakka, Mariam, Marios Fournarakis, Olga Krestinskaya, et al. 2025. “Mixed-Precision Quantization for Language Models: Techniques and Prospects.” arXiv:2510.16805. Preprint, arXiv, October 19. https://doi.org/10.48550/arXiv.2510.16805.

Tian, Chunlin, Xuyang Wei, Huanrong Liu, Zhijiang Guo, and Li Li. 2025. “Less Is More: Resource-Efficient Low-Rank Adaptation.” arXiv:2512.00878. Preprint, arXiv, November 30. https://doi.org/10.48550/arXiv.2512.00878.

Wu, Junyi, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, and Yan Yan. 2024. “PTQ4DiT: Post-Training Quantization for Diffusion Transformers.” arXiv:2405.16005. Version 3. Preprint, arXiv, October 17. https://doi.org/10.48550/arXiv.2405.16005.

Information retrieval

Chen, Xiaoyang, Ben He, Hongyu Lin, et al. 2024. “Spiral of Silence: How Is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering.” arXiv:2404.10496. Preprint, arXiv, June 23. https://doi.org/10.48550/arXiv.2404.10496.

Csizmadia, Daniel, Andrei Codreanu, Victor Sim, et al. 2025. “Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation.” arXiv:2505.21549. Version 2. Preprint, arXiv, May 29. https://doi.org/10.48550/arXiv.2505.21549.

Guu, Kelvin, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. “REALM: Retrieval-Augmented Language Model Pre-Training.” arXiv:2002.08909. Preprint, arXiv, February 10. https://doi.org/10.48550/arXiv.2002.08909.

Izacard, Gautier, Mathilde Caron, Lucas Hosseini, et al. 2022. “Unsupervised Dense Information Retrieval with Contrastive Learning.” arXiv:2112.09118. Preprint, arXiv, August 29. https://doi.org/10.48550/arXiv.2112.09118.

Karpukhin, Vladimir, Barlas Oğuz, Sewon Min, et al. 2020. “Dense Passage Retrieval for Open-Domain Question Answering.” arXiv:2004.04906. Preprint, arXiv, September 30. https://doi.org/10.48550/arXiv.2004.04906.

Khattab, Omar, and Matei Zaharia. 2020. “ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT.” arXiv:2004.12832. Preprint, arXiv, June 4. https://doi.org/10.48550/arXiv.2004.12832.

Lewis, Patrick, Ethan Perez, Aleksandra Piktus, et al. 2021. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” arXiv:2005.11401. Preprint, arXiv, April 12. https://doi.org/10.48550/arXiv.2005.11401.

Pan, Zhenyu, Haozheng Luo, Manling Li, and Han Liu. 2024. “Conv-CoA: Improving Open-Domain Question Answering in Large Language Models via Conversational Chain-of-Action.” arXiv:2405.17822. Preprint, arXiv, May 28. https://doi.org/10.48550/arXiv.2405.17822.

Thakur, Nandan, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. “BEIR: A Heterogenous Benchmark for Zero-Shot Evaluation of Information Retrieval Models.” arXiv:2104.08663. Preprint, arXiv, October 21. https://doi.org/10.48550/arXiv.2104.08663.

Zhong, Ming, Zhizhi Wu, and Nanako Honda. 2024. “Deep Learning Based Dense Retrieval: A Comparative Study.” arXiv:2410.20315. Version 1. Preprint, arXiv, October 27. https://doi.org/10.48550/arXiv.2410.20315.

Machine translation

Bao, Guangsheng, Zhiyang Teng, and Yue Zhang. 2023. “Target-Side Augmentation for Document-Level Machine Translation.” arXiv:2305.04505. Preprint, arXiv, June 4. https://doi.org/10.48550/arXiv.2305.04505.

Bogoychev, Nikolay, and Pinzhen Chen. 2023. “Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting.” arXiv:2310.05824. Preprint, arXiv, October 9. https://doi.org/10.48550/arXiv.2310.05824.

Dale, David, Elena Voita, Janice Lam, et al. 2023. “HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation.” arXiv:2305.11746. Preprint, arXiv, December 6. https://doi.org/10.48550/arXiv.2305.11746.

Guerreiro, Nuno M., Duarte Alves, Jonas Waldendorf, et al. 2023. “Hallucinations in Large Multilingual Translation Models.” arXiv:2303.16104. Preprint, arXiv, March 28. https://doi.org/10.48550/arXiv.2303.16104.

He, Zhiwei, Tian Liang, Wenxiang Jiao, et al. 2023. “Exploring Human-Like Translation Strategy with Large Language Models.” arXiv:2305.04118. Preprint, arXiv, November 29. https://doi.org/10.48550/arXiv.2305.04118.

Herold, Christian, and Hermann Ney. 2023. “Improving Long Context Document-Level Machine Translation.” arXiv:2306.05183. Preprint, arXiv, June 8. https://doi.org/10.48550/arXiv.2306.05183.

Lu, Hongyuan, Haoran Yang, Haoyang Huang, Dongdong Zhang, Wai Lam, and Furu Wei. 2024. “Chain-of-Dictionary Prompting Elicits Translation in Large Language Models.” arXiv:2305.06575. Preprint, arXiv, August 17. https://doi.org/10.48550/arXiv.2305.06575.

Sennrich, Rico, Jannis Vamvas, and Alireza Mohammadshahi. 2024. “Mitigating Hallucinations and Off-Target Machine Translation with Source-Contrastive and Language-Contrastive Decoding.” arXiv:2309.07098. Preprint, arXiv, January 29. https://doi.org/10.48550/arXiv.2309.07098.

Wang, Longyue, Chenyang Lyu, Tianbo Ji, et al. 2023. “Document-Level Machine Translation with Large Language Models.” arXiv:2304.02210. Preprint, arXiv, October 24. https://doi.org/10.48550/arXiv.2304.02210.

Zhu, Wenhao, Hongyi Liu, Qingxiu Dong, et al. 2024. “Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis.” arXiv:2304.04675. Preprint, arXiv, June 14. https://doi.org/10.48550/arXiv.2304.04675.

ML4Science

Berner, Julius, Miguel Liu-Schiaffini, Jean Kossaifi, et al. 2025. “Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning.” arXiv:2506.10973. Preprint, arXiv, June 12. https://doi.org/10.48550/arXiv.2506.10973.

Ganga, Sai, and Ziya Uddin. 2024. “Exploring Physics-Informed Neural Networks: From Fundamentals to Applications in Complex Systems.” arXiv:2410.00422. Preprint, arXiv, October 1. https://doi.org/10.48550/arXiv.2410.00422.

Karumuri, Sharmila, Lori Graham-Brady, and Somdatta Goswami. 2025. “Physics-Informed Latent Neural Operator for Real-Time Predictions of Time-Dependent Parametric PDEs.” arXiv:2501.08428. Version 3. Preprint, arXiv, October 28. https://doi.org/10.48550/arXiv.2501.08428.

Lassen, Oskar Bohn, Serio Angelo Maria Agriesti, Filipe Rodrigues, and Francisco Camara Pereira. 2025. “Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM.” arXiv:2510.07971. Version 1. Preprint, arXiv, October 9. https://doi.org/10.48550/arXiv.2510.07971.

Lejarza, Fernando, and Michael Baldea. 2022. “DySMHO: Data-Driven Discovery of Governing Equations for Dynamical Systems via Moving Horizon Optimization.” Scientific Reports 12 (1): 11836. https://doi.org/10.1038/s41598-022-13644-w.

Oommen, Vivek, Siavash Khodakarami, Aniruddha Bora, Zhicheng Wang, and George Em Karniadakis. 2025. “Learning Turbulent Flows with Generative Models: Super-Resolution, Forecasting, and Sparse Flow Reconstruction.” arXiv:2509.08752. Preprint, arXiv, September 10. https://doi.org/10.48550/arXiv.2509.08752.

Owens, Katherine, and J. Nathan Kutz. 2022. “Data-Driven Discovery of Governing Equations for Coarse-Grained Heterogeneous Network Dynamics.” arXiv:2205.10965. Preprint, arXiv, May 23. https://doi.org/10.48550/arXiv.2205.10965.

Tauberschmidt, Jan, Sophie Fellenz, Sebastian J. Vollmer, and Andrew B. Duncan. 2025. “Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems.” arXiv:2508.09156. Preprint, arXiv, August 5. https://doi.org/10.48550/arXiv.2508.09156.

Toscano, Juan Diego, Vivek Oommen, Alan John Varghese, et al. 2024. “From PINNs to PIKANs: Recent Advances in Physics-Informed Machine Learning.” arXiv:2410.13228. Preprint, arXiv, October 22. https://doi.org/10.48550/arXiv.2410.13228.

You, Wen, Shaoqian Zhou, and Xuhui Meng. 2025. “Self-Supervised Neural Operator for Solving Partial Differential Equations.” arXiv:2509.00867. Version 1. Preprint, arXiv, August 31. https://doi.org/10.48550/arXiv.2509.00867.

Modèle génératifs

Huang, Rongjie, Mingze Li, Dongchao Yang, et al. 2023. “AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.” arXiv:2304.12995. Preprint, arXiv, April 25. https://doi.org/10.48550/arXiv.2304.12995.

Luo, Zhengxiong, Dayou Chen, Yingya Zhang, et al. 2023. “VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation.” arXiv:2303.08320. Preprint, arXiv, October 13. https://doi.org/10.48550/arXiv.2303.08320.

Moser, Brian B., Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio, and Andreas Dengel. 2025. “Diffusion Models, Image Super-Resolution And Everything: A Survey.” IEEE Transactions on Neural Networks and Learning Systems 36 (7): 11793–813. https://doi.org/10.1109/TNNLS.2024.3476671.

Nichol, Alex, Prafulla Dhariwal, Aditya Ramesh, et al. 2022. “GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.” arXiv:2112.10741. Preprint, arXiv, March 8. https://doi.org/10.48550/arXiv.2112.10741.

Saharia, Chitwan, William Chan, Saurabh Saxena, et al. 2022. “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding.” arXiv:2205.11487. Preprint, arXiv, May 23. https://doi.org/10.48550/arXiv.2205.11487.

Sordo, Zineb, Eric Chagnon, and Daniela Ushizima. 2025. “A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images.” arXiv:2502.21151. Version 2. Preprint, arXiv, March 10. https://doi.org/10.48550/arXiv.2502.21151.

Sun, Quan, Qiying Yu, Yufeng Cui, et al. 2023. “Generative Pretraining in Multimodality.” arXiv:2307.05222. Version 1. Preprint, arXiv, July 11. https://doi.org/10.48550/arXiv.2307.05222.

Xu, Katherine, Lingzhi Zhang, and Jianbo Shi. 2025. “Detecting Origin Attribution for Text-to-Image Diffusion Models.” arXiv:2403.19653. Preprint, arXiv, April 16. https://doi.org/10.48550/arXiv.2403.19653.

Zhang, Jinjin, Qiuyu Huang, Junjie Liu, Xiefan Guo, and Di Huang. 2025. “Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models.” arXiv:2503.18352. Version 1. Preprint, arXiv, March 24. https://doi.org/10.48550/arXiv.2503.18352.

Zhang, Lvmin, Anyi Rao, and Maneesh Agrawala. 2023. “Adding Conditional Control to Text-to-Image Diffusion Models.” arXiv:2302.05543. Preprint, arXiv, November 26. https://doi.org/10.48550/arXiv.2302.05543.

Modèles de fondation

Frantar, Elias, Carlos Riquelme, Neil Houlsby, Dan Alistarh, and Utku Evci. 2023. “Scaling Laws for Sparsely-Connected Foundation Models.” arXiv:2309.08520. Preprint, arXiv, September 15. https://doi.org/10.48550/arXiv.2309.08520.

Liu, Fan, Tianshu Zhang, Wenwen Dai, Wenwen Cai, Xiaocong Zhou, and Delong Chen. 2024. “Few-Shot Adaptation of Multi-Modal Foundation Models: A Survey.” arXiv:2401.01736. Preprint, arXiv, January 4. https://doi.org/10.48550/arXiv.2401.01736.

Liu, Xu, Tong Zhou, Yuanxin Wang, et al. 2023. “Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey.” arXiv:2312.10163. Preprint, arXiv, December 15. https://doi.org/10.48550/arXiv.2312.10163.

Lu, Jianglin, Hailing Wang, Yi Xu, Yizhou Wang, Kuo Yang, and Yun Fu. 2025. “Representation Potentials of Foundation Models for Multimodal Alignment: A Survey.” arXiv:2510.05184. Preprint, arXiv, October 5. https://doi.org/10.48550/arXiv.2510.05184.

Schneider, Johannes, Christian Meske, and Pauline Kuss. 2024. “Foundation Models.” Business & Information Systems Engineering 66 (2): 221–31. https://doi.org/10.1007/s12599-024-00851-0.

Subramanian, Shashank, Peter Harrington, Kurt Keutzer, et al. 2023. “Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior.” arXiv:2306.00258. Preprint, arXiv, June 1. https://doi.org/10.48550/arXiv.2306.00258.

Sun, Weigao, Jiaxi Hu, Yucheng Zhou, et al. 2025. “Speed Always Wins: A Survey on Efficient Architectures for Large Language Models.” arXiv:2508.09834. Preprint, arXiv, August 13. https://doi.org/10.48550/arXiv.2508.09834.

Xu, Mengwei, Wangsong Yin, Dongqi Cai, et al. 2024a. “A Survey of Resource-Efficient LLM and Multimodal Foundation Models.” arXiv:2401.08092. Preprint, arXiv, September 23. https://doi.org/10.48550/arXiv.2401.08092.

Xu, Mengwei, Wangsong Yin, Dongqi Cai, et al. 2024b. “A Survey of Resource-Efficient LLM and Multimodal Foundation Models.” arXiv:2401.08092. Preprint, arXiv, September 23. https://doi.org/10.48550/arXiv.2401.08092.

Yuan, Yang. 2024. “On the Power of Foundation Models.” arXiv:2211.16327. Preprint, arXiv, October 22. https://doi.org/10.48550/arXiv.2211.16327.

Musique

Afchar, Darius, Gabriel Meseguer-Brocal, and Romain Hennequin. 2025. “AI-Generated Music Detection and Its Challenges.” arXiv:2501.10111. Version 1. Preprint, arXiv, January 17. https://doi.org/10.48550/arXiv.2501.10111.

Agostinelli, Andrea, Timo I. Denk, Zalán Borsos, et al. 2023. “MusicLM: Generating Music From Text.” arXiv:2301.11325. Preprint, arXiv, January 26. https://doi.org/10.48550/arXiv.2301.11325.

Chen, Yanxu, Linshu Huang, and Tian Gou. 2024. “Applications and Advances of Artificial Intelligence in Music Generation:A Review.” arXiv:2409.03715. Preprint, arXiv, September 3. https://doi.org/10.48550/arXiv.2409.03715.

Copet, Jade, Felix Kreuk, Itai Gat, et al. 2024. “Simple and Controllable Music Generation.” arXiv:2306.05284. Preprint, arXiv, January 30. https://doi.org/10.48550/arXiv.2306.05284.

Evans, Zach, Julian D. Parker, C. J. Carr, Zack Zukowski, Josiah Taylor, and Jordi Pons. 2024. “Long-Form Music Generation with Latent Diffusion.” arXiv:2404.10301. Version 2. Preprint, arXiv, July 29. https://doi.org/10.48550/arXiv.2404.10301.

Huang, Qingqing, Daniel S. Park, Tao Wang, et al. 2023. “Noise2Music: Text-Conditioned Music Generation with Diffusion Models.” arXiv:2302.03917. Preprint, arXiv, March 6. https://doi.org/10.48550/arXiv.2302.03917.

Lam, Max W. Y., Qiao Tian, Tang Li, et al. 2023. “Efficient Neural Music Generation.” arXiv:2305.15719. Preprint, arXiv, May 25. https://doi.org/10.48550/arXiv.2305.15719.

Lehmkuhl, Jonathan, Ábel Ilyés-Kun, Nico Bremes, Cemhan Kaan Özaltan, Frederik Muthers, and Jiayi Yuan. 2025. “Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics.” arXiv:2511.07268. Preprint, arXiv, November 10. https://doi.org/10.48550/arXiv.2511.07268.

Wu, Shih-Lun, and Yi-Hsuan Yang. 2022. “MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer with One Transformer VAE.” arXiv:2105.04090. Preprint, arXiv, December 19. https://doi.org/10.48550/arXiv.2105.04090.

Yuan, Ruibin, Hanfeng Lin, Yi Wang, et al. 2024. “ChatMusician: Understanding and Generating Music Intrinsically with LLM.” arXiv:2402.16153. Preprint, arXiv, February 25. https://doi.org/10.48550/arXiv.2402.16153.

Sécurité des LLMs

Chang, Amy, Nicholas Conley, Harish Santhanalakshmi Ganesan, and Adam Swanda. 2025. “Death by a Thousand Prompts: Open Model Vulnerability Analysis.” arXiv:2511.03247. Version 1. Preprint, arXiv, November 5. https://doi.org/10.48550/arXiv.2511.03247.

Chen, Sizhe, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, and Chuan Guo. 2024. “Aligning LLMs to Be Robust Against Prompt Injection.” arXiv:2410.05451. Version 1. Preprint, arXiv, October 7. https://doi.org/10.48550/arXiv.2410.05451.

Du, Chenghao, Quanfeng Huang, Tingxuan Tang, Zihao Wang, Adwait Nadkarni, and Yue Xiao. 2025. “Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels.” arXiv:2510.27140. Preprint, arXiv, November 6. https://doi.org/10.48550/arXiv.2510.27140.

Jia, Feiran, Tong Wu, Xin Qin, and Anna Squicciarini. 2024. “The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents.” arXiv:2412.16682. Preprint, arXiv, December 21. https://doi.org/10.48550/arXiv.2412.16682.

Kumar, Aounon, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, and Himabindu Lakkaraju. 2025. “Certifying LLM Safety against Adversarial Prompting.” arXiv:2309.02705. Preprint, arXiv, February 4. https://doi.org/10.48550/arXiv.2309.02705.

Peng, Benji, Ziqian Bi, Qian Niu, et al. 2024. “Jailbreaking and Mitigation of Vulnerabilities in Large Language Models.” arXiv:2410.15236. Version 1. Preprint, arXiv, October 20. https://doi.org/10.48550/arXiv.2410.15236.

Shang, Zhengchun, and Wenlan Wei. 2025. “Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses.” arXiv:2504.02080. Version 1. Preprint, arXiv, April 2. https://doi.org/10.48550/arXiv.2504.02080.

Shi, Jiawen, Zenghui Yuan, Yinuo Liu, et al. 2025. “Optimization-Based Prompt Injection Attack to LLM-as-a-Judge.” arXiv:2403.17710. Preprint, arXiv, August 24. https://doi.org/10.48550/arXiv.2403.17710.

Yi, Sibo, Yule Liu, Zhen Sun, et al. 2024. “Jailbreak Attacks and Defenses Against Large Language Models: A Survey.” arXiv:2407.04295. Preprint, arXiv, August 30. https://doi.org/10.48550/arXiv.2407.04295.

Zhao, Andrew, Reshmi Ghosh, Vitor Carvalho, et al. 2025. “Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-Based Optimizers.” arXiv:2510.14381. Preprint, arXiv, October 16. https://doi.org/10.48550/arXiv.2510.14381.

Segmentation

Catalano, Nico, and Matteo Matteucci. 2024. “Few Shot Semantic Segmentation: A Review of Methodologies, Benchmarks, and Open Challenges.” arXiv:2304.05832. Preprint, arXiv, May 20. https://doi.org/10.48550/arXiv.2304.05832.

Ke, Lei, Mingqiao Ye, Martin Danelljan, et al. 2023. “Segment Anything in High Quality.” arXiv:2306.01567. Preprint, arXiv, October 23. https://doi.org/10.48550/arXiv.2306.01567.

Kirillov, Alexander, Eric Mintun, Nikhila Ravi, et al. 2023. “Segment Anything.” arXiv:2304.02643. Preprint, arXiv, April 5. https://doi.org/10.48550/arXiv.2304.02643.

Li, Feng, Hao Zhang, Peize Sun, et al. 2023. “Semantic-SAM: Segment and Recognize Anything at Any Granularity.” arXiv:2307.04767. Preprint, arXiv, July 10. https://doi.org/10.48550/arXiv.2307.04767.

Li, Feng, Hao Zhang, Huaizhe xu, et al. 2022. “Mask DINO: Towards A Unified Transformer-Based Framework for Object Detection and Segmentation.” arXiv:2206.02777. Preprint, arXiv, December 12. https://doi.org/10.48550/arXiv.2206.02777.

Liu, Xinyu, Beiwen Tian, Zhen Wang, et al. 2023. “Delving Into Shape-Aware Zero-Shot Semantic Segmentation.” 2999–3009. https://openaccess.thecvf.com/content/CVPR2023/html/Liu_Delving_Into_Shape-Aware_Zero-Shot_Semantic_Segmentation_CVPR_2023_paper.html.

Rajič, Frano, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, and Fisher Yu. 2023. “Segment Anything Meets Point Tracking.” arXiv:2307.01197. Preprint, arXiv, December 3. https://doi.org/10.48550/arXiv.2307.01197.

Wang, Xinlong, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, and Tiejun Huang. 2023. “SegGPT: Towards Segmenting Everything in Context.” 1130–40. https://openaccess.thecvf.com/content/ICCV2023/html/Wang_SegGPT_Towards_Segmenting_Everything_in_Context_ICCV_2023_paper.html.

Xu, Jiarui, Shalini De Mello, Sifei Liu, et al. 2022. “GroupViT: Semantic Segmentation Emerges from Text Supervision.” arXiv:2202.11094. Preprint, arXiv, July 18. https://doi.org/10.48550/arXiv.2202.11094.

Xu, Jilan, Junlin Hou, Yuejie Zhang, et al. 2023. “Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.” 2935–44. https://openaccess.thecvf.com/content/CVPR2023/html/Xu_Learning_Open-Vocabulary_Semantic_Segmentation_Models_From_Natural_Language_Supervision_CVPR_2023_paper.html.

Self-supervized learning

Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. “A Simple Framework for Contrastive Learning of Visual Representations.” arXiv:2002.05709. Preprint, arXiv, July 1. https://doi.org/10.48550/arXiv.2002.05709.

Chen, Wenxi, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, and Xie Chen. 2024. “EAT: Self-Supervised Pre-Training with Efficient Audio Transformer.” arXiv:2401.03497. Preprint, arXiv, January 7. https://doi.org/10.48550/arXiv.2401.03497.

Guo, Huijie, Jingyao Wang, Peizheng Guo, Xingchen Shen, Changwen Zheng, and Wenwen Qiang. 2025. “Exploring Transferability of Self-Supervised Learning by Task Conflict Calibration.” arXiv:2511.13787. Preprint, arXiv, November 16. https://doi.org/10.48550/arXiv.2511.13787.

Hondru, Vlad, Florinel Alin Croitoru, Shervin Minaee, Radu Tudor Ionescu, and Nicu Sebe. 2024. “Masked Image Modeling: A Survey.” arXiv:2408.06687. Version 1. Preprint, arXiv, August 13. https://doi.org/10.48550/arXiv.2408.06687.

Liu, Ziyu, Azadeh Alavi, Minyi Li, and Xiang Zhang. 2024. “Self-Supervised Learning for Time Series: Contrastive or Generative?” arXiv:2403.09809. Version 1. Preprint, arXiv, March 14. https://doi.org/10.48550/arXiv.2403.09809.

Ma, Duo, Xianghu Yue, Junyi Ao, Xiaoxue Gao, and Haizhou Li. 2024. “Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks.” arXiv:2402.15725. Version 3. Preprint, arXiv, July 22. https://doi.org/10.48550/arXiv.2402.15725.

Naiman, Ilan, Emanuel Ben-Baruch, Oron Anschel, et al. 2025. “LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders.” arXiv:2504.03501. Preprint, arXiv, October 7. https://doi.org/10.48550/arXiv.2504.03501.

Shi, Yuge, Imant Daunhawer, Julia E. Vogt, Philip H. S. Torr, and Amartya Sanyal. 2022. “How Robust Is Unsupervised Representation Learning to Distribution Shift?” arXiv:2206.08871. Preprint, arXiv, December 16. https://doi.org/10.48550/arXiv.2206.08871.

Tan, Fuwen, Fatemeh Saleh, and Brais Martinez. 2023. “Effective Self-Supervised Pre-Training on Low-Compute Networks without Distillation.” arXiv:2210.02808. Preprint, arXiv, October 2. https://doi.org/10.48550/arXiv.2210.02808.

Zong, Yongshuo, Oisin Mac Aodha, and Timothy Hospedales. 2024. “Self-Supervised Multimodal Learning: A Survey.” arXiv:2304.01008. Preprint, arXiv, August 16. https://doi.org/10.48550/arXiv.2304.01008.

Speech

Borsos, Zalán, Raphaël Marinier, Damien Vincent, et al. 2023. “AudioLM: A Language Modeling Approach to Audio Generation.” arXiv:2209.03143. Preprint, arXiv, July 26. https://doi.org/10.48550/arXiv.2209.03143.

Chan, William, Daniel Park, Chris Lee, Yu Zhang, Quoc Le, and Mohammad Norouzi. 2021. “SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network.” arXiv:2104.02133. Preprint, arXiv, April 27. https://doi.org/10.48550/arXiv.2104.02133.

Chen, Sanyuan, Chengyi Wang, Zhengyang Chen, et al. 2022. “WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.” IEEE Journal of Selected Topics in Signal Processing 16 (6): 1505–18. https://doi.org/10.1109/JSTSP.2022.3188113.

Cui, Wenqian, Dianzhi Yu, Xiaoqi Jiao, et al. 2025. “Recent Advances in Speech Language Models: A Survey.” arXiv:2410.03751. Preprint, arXiv, August 7. https://doi.org/10.48550/arXiv.2410.03751.

Gulati, Anmol, James Qin, Chung-Cheng Chiu, et al. 2020. “Conformer: Convolution-Augmented Transformer for Speech Recognition.” arXiv:2005.08100. Preprint, arXiv, May 16. https://doi.org/10.48550/arXiv.2005.08100.

Huang, Rongjie, Mingze Li, Dongchao Yang, et al. 2023. “AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.” arXiv:2304.12995. Preprint, arXiv, April 25. https://doi.org/10.48550/arXiv.2304.12995.

Ju, Zeqian, Yuancheng Wang, Kai Shen, et al. 2024. “NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.” arXiv:2403.03100. Preprint, arXiv, April 23. https://doi.org/10.48550/arXiv.2403.03100.

Lu, Yizhou, Mingkun Huang, Xinghua Qu, Pengfei Wei, and Zejun Ma. 2022. “Language Adaptive Cross-Lingual Speech Representation Learning with Sparse Sharing Sub-Networks.” arXiv:2203.04583. Preprint, arXiv, March 9. https://doi.org/10.48550/arXiv.2203.04583.

Mohamed, Abdelrahman, Hung-yi Lee, Lasse Borgholt, et al. 2022. “Self-Supervised Speech Representation Learning: A Review.” IEEE Journal of Selected Topics in Signal Processing 16 (6): 1179–210. https://doi.org/10.1109/JSTSP.2022.3207050.

Radford, Alec, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. “Robust Speech Recognition via Large-Scale Weak Supervision.” arXiv:2212.04356. Preprint, arXiv, December 6. https://doi.org/10.48550/arXiv.2212.04356.