M2 MIND - MEDS - Projet de recherche

Attention Ceci est une très mauvaise bibliographie (ça fait partie de l’exercice):

des articles de qualité diverse;
pas forcément des publications acceptés;
uniquement des liens arXiv, pas de référence vers les publications acceptées.

Ne faites pas comme ça.

Attaques adverses

Chen, Meng, Jiawei Tu, Chao Qi, et al. 2025. “Towards Physically Realizable Adversarial Attacks in Embodied Vision Navigation.” arXiv:2409.10071. Version 5. Preprint, arXiv, August 15. https://doi.org/10.48550/arXiv.2409.10071.

Cools, Kasper, Clara Maathuis, Alexander M. van Oers, et al. 2025. “Vision Transformers: The Threat of Realistic Adversarial Patches.” arXiv:2509.21084. Preprint, arXiv, September 25. https://doi.org/10.48550/arXiv.2509.21084.

Goldblum, Micah, Dimitris Tsipras, Chulin Xie, et al. 2021. “Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses.” arXiv:2012.10544. Preprint, arXiv, March 31. https://doi.org/10.48550/arXiv.2012.10544.

Gu, Jindong, Xiaojun Jia, Pau de Jorge, et al. 2024. “A Survey on Transferability of Adversarial Examples across Deep Neural Networks.” arXiv:2310.17626. Preprint, arXiv, May 2. https://doi.org/10.48550/arXiv.2310.17626.

Laugros, Alfred, Alice Caplier, and Matthieu Ospici. 2021. “Using Synthetic Corruptions to Measure Robustness to Natural Distribution Shifts.” arXiv:2107.12052. Preprint, arXiv, November 18. https://doi.org/10.48550/arXiv.2107.12052.

Li, Yiquan, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Bo Li, and Chaowei Xiao. 2024. “Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness.” arXiv:2407.00623. Version 1. Preprint, arXiv, June 30. https://doi.org/10.48550/arXiv.2407.00623.

Lu, Liming, Shuchao Pang, Siyuan Liang, et al. 2025. “Adversarial Training for Multimodal Large Language Models against Jailbreak Attacks.” arXiv:2503.04833. Preprint, arXiv, March 18. https://doi.org/10.48550/arXiv.2503.04833.

Lyu, Saiyue, Shadab Shaikh, Frederick Shpilevskiy, Evan Shelhamer, and Mathias Lécuyer. 2025. “Adaptive Randomized Smoothing: Certified Adversarial Robustness for Multi-Step Defences.” arXiv:2406.10427. Version 3. Preprint, arXiv, July 10. https://doi.org/10.48550/arXiv.2406.10427.

Mahmood, Kaleel, Rigel Mahmood, and Marten van Dijk. 2021. “On the Robustness of Vision Transformers to Adversarial Examples.” arXiv:2104.02610. Preprint, arXiv, June 5. https://doi.org/10.48550/arXiv.2104.02610.

Wang, Jiakai, Xianglong Liu, Jin Hu, et al. 2024. “Adversarial Examples in the Physical World: A Survey.” arXiv:2311.01473. Version 2. Preprint, arXiv, July 19. https://doi.org/10.48550/arXiv.2311.01473.

Contrôle robotique

Akki, Shivayogi, and Tan Chen. 2025. “Benchmarking Model Predictive Control and Reinforcement Learning Based Control for Legged Robot Locomotion in MuJoCo Simulation.” arXiv:2501.16590. Preprint, arXiv, January 28. https://doi.org/10.48550/arXiv.2501.16590.

Brohan, Anthony, Noah Brown, Justice Carbajal, et al. 2023a. “RT-1: Robotics Transformer for Real-World Control at Scale.” arXiv:2212.06817. Preprint, arXiv, August 11. https://doi.org/10.48550/arXiv.2212.06817.

Brohan, Anthony, Noah Brown, Justice Carbajal, et al. 2023b. “RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.” arXiv:2307.15818. Preprint, arXiv, July 28. https://doi.org/10.48550/arXiv.2307.15818.

Burchi, Maxime, and Radu Timofte. 2024. “MuDreamer: Learning Predictive World Models without Reconstruction.” arXiv:2405.15083. Preprint, arXiv, May 23. https://doi.org/10.48550/arXiv.2405.15083.

Chittepu, Yaswanth, Blossom Metevier, Will Schwarzer, Austin Hoag, Scott Niekum, and Philip S. Thomas. 2025. “Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints.” arXiv:2506.08266. Version 1. Preprint, arXiv, June 9. https://doi.org/10.48550/arXiv.2506.08266.

Gajewski, Paul, Dominik Żurek, Marcin Pietroń, and Kamil Faber. 2024. “Solving Multi-Goal Robotic Tasks with Decision Transformer.” arXiv:2410.06347. Preprint, arXiv, October 8. https://doi.org/10.48550/arXiv.2410.06347.

Li, Zezeng, Alexandre Chapin, Enda Xiang, et al. 2025. “Robotic Manipulation via Imitation Learning: Taxonomy, Evolution, Benchmark, and Challenges.” arXiv:2508.17449. Version 1. Preprint, arXiv, August 24. https://doi.org/10.48550/arXiv.2508.17449.

Morad, Steven, Ajay Shankar, Jan Blumenkamp, and Amanda Prorok. 2024. “Language-Conditioned Offline RL for Multi-Robot Navigation.” arXiv:2407.20164. Preprint, arXiv, July 29. https://doi.org/10.48550/arXiv.2407.20164.

Nair, Suraj, Aravind Rajeswaran, Vikash Kumar, Chelsea Finn, and Abhinav Gupta. 2022. “R3M: A Universal Visual Representation for Robot Manipulation.” arXiv:2203.12601. Preprint, arXiv, November 18. https://doi.org/10.48550/arXiv.2203.12601.

Zakka, Kevin, Baruch Tabanpour, Qiayuan Liao, et al. 2025. “MuJoCo Playground.” arXiv:2502.08844. Version 1. Preprint, arXiv, February 12. https://doi.org/10.48550/arXiv.2502.08844.

Efficacité

Chen, Lei, Yuan Meng, Chen Tang, et al. 2024. “Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers.” arXiv:2406.17343. Version 1. Preprint, arXiv, June 25. https://doi.org/10.48550/arXiv.2406.17343.

Fladmark, Eirik, Muhammad Hamza Sajjad, and Laura Brinkholm Justesen. 2023. “Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis.” arXiv:2303.15479. Preprint, arXiv, March 26. https://doi.org/10.48550/arXiv.2303.15479.

Gu, Yuxian, Li Dong, Furu Wei, and Minlie Huang. 2025. “MiniLLM: Knowledge Distillation of Large Language Models.” arXiv:2306.08543. Preprint, arXiv, November 21. https://doi.org/10.48550/arXiv.2306.08543.

Huang, Xijie, Zhiqiang Shen, Pingcheng Dong, and Kwang-Ting Cheng. 2024. “Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precision.” arXiv:2307.00331. Preprint, arXiv, October 12. https://doi.org/10.48550/arXiv.2307.00331.

Jayanth, Rakshith, Neelesh Gupta, and Viktor Prasanna. 2024. “Benchmarking Edge AI Platforms for High-Performance ML Inference.” arXiv:2409.14803. Version 1. Preprint, arXiv, September 23. https://doi.org/10.48550/arXiv.2409.14803.

Liang, Jessica, and Anirudh Bharadwaj. 2025. “QR-LoRA: QR-Based Low-Rank Adaptation for Efficient Fine-Tuning of Large Language Models.” arXiv:2508.21810. Preprint, arXiv, August 29. https://doi.org/10.48550/arXiv.2508.21810.

Mansourian, Amir M., Rozhan Ahmadi, Masoud Ghafouri, et al. 2025. “A Comprehensive Survey on Knowledge Distillation.” arXiv:2503.12067. Preprint, arXiv, October 11. https://doi.org/10.48550/arXiv.2503.12067.

Rakka, Mariam, Marios Fournarakis, Olga Krestinskaya, et al. 2025. “Mixed-Precision Quantization for Language Models: Techniques and Prospects.” arXiv:2510.16805. Preprint, arXiv, October 19. https://doi.org/10.48550/arXiv.2510.16805.

Tian, Chunlin, Xuyang Wei, Huanrong Liu, Zhijiang Guo, and Li Li. 2025. “Less Is More: Resource-Efficient Low-Rank Adaptation.” arXiv:2512.00878. Preprint, arXiv, November 30. https://doi.org/10.48550/arXiv.2512.00878.

Wu, Junyi, Haoxuan Wang, Yuzhang Shang, Mubarak Shah, and Yan Yan. 2024. “PTQ4DiT: Post-Training Quantization for Diffusion Transformers.” arXiv:2405.16005. Version 3. Preprint, arXiv, October 17. https://doi.org/10.48550/arXiv.2405.16005.

Information retrieval

Chen, Xiaoyang, Ben He, Hongyu Lin, et al. 2024. “Spiral of Silence: How Is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering.” arXiv:2404.10496. Preprint, arXiv, June 23. https://doi.org/10.48550/arXiv.2404.10496.

Csizmadia, Daniel, Andrei Codreanu, Victor Sim, et al. 2025. “Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation.” arXiv:2505.21549. Version 2. Preprint, arXiv, May 29. https://doi.org/10.48550/arXiv.2505.21549.

Guu, Kelvin, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. “REALM: Retrieval-Augmented Language Model Pre-Training.” arXiv:2002.08909. Preprint, arXiv, February 10. https://doi.org/10.48550/arXiv.2002.08909.

Izacard, Gautier, Mathilde Caron, Lucas Hosseini, et al. 2022. “Unsupervised Dense Information Retrieval with Contrastive Learning.” arXiv:2112.09118. Preprint, arXiv, August 29. https://doi.org/10.48550/arXiv.2112.09118.

Karpukhin, Vladimir, Barlas Oğuz, Sewon Min, et al. 2020. “Dense Passage Retrieval for Open-Domain Question Answering.” arXiv:2004.04906. Preprint, arXiv, September 30. https://doi.org/10.48550/arXiv.2004.04906.

Khattab, Omar, and Matei Zaharia. 2020. “ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT.” arXiv:2004.12832. Preprint, arXiv, June 4. https://doi.org/10.48550/arXiv.2004.12832.

Lewis, Patrick, Ethan Perez, Aleksandra Piktus, et al. 2021. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” arXiv:2005.11401. Preprint, arXiv, April 12. https://doi.org/10.48550/arXiv.2005.11401.

Pan, Zhenyu, Haozheng Luo, Manling Li, and Han Liu. 2024. “Conv-CoA: Improving Open-Domain Question Answering in Large Language Models via Conversational Chain-of-Action.” arXiv:2405.17822. Preprint, arXiv, May 28. https://doi.org/10.48550/arXiv.2405.17822.

Thakur, Nandan, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. 2021. “BEIR: A Heterogenous Benchmark for Zero-Shot Evaluation of Information Retrieval Models.” arXiv:2104.08663. Preprint, arXiv, October 21. https://doi.org/10.48550/arXiv.2104.08663.

Zhong, Ming, Zhizhi Wu, and Nanako Honda. 2024. “Deep Learning Based Dense Retrieval: A Comparative Study.” arXiv:2410.20315. Version 1. Preprint, arXiv, October 27. https://doi.org/10.48550/arXiv.2410.20315.

Machine translation

Bao, Guangsheng, Zhiyang Teng, and Yue Zhang. 2023. “Target-Side Augmentation for Document-Level Machine Translation.” arXiv:2305.04505. Preprint, arXiv, June 4. https://doi.org/10.48550/arXiv.2305.04505.

Bogoychev, Nikolay, and Pinzhen Chen. 2023. “Terminology-Aware Translation with Constrained Decoding and Large Language Model Prompting.” arXiv:2310.05824. Preprint, arXiv, October 9. https://doi.org/10.48550/arXiv.2310.05824.

Dale, David, Elena Voita, Janice Lam, et al. 2023. “HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation.” arXiv:2305.11746. Preprint, arXiv, December 6. https://doi.org/10.48550/arXiv.2305.11746.

Guerreiro, Nuno M., Duarte Alves, Jonas Waldendorf, et al. 2023. “Hallucinations in Large Multilingual Translation Models.” arXiv:2303.16104. Preprint, arXiv, March 28. https://doi.org/10.48550/arXiv.2303.16104.

He, Zhiwei, Tian Liang, Wenxiang Jiao, et al. 2023. “Exploring Human-Like Translation Strategy with Large Language Models.” arXiv:2305.04118. Preprint, arXiv, November 29. https://doi.org/10.48550/arXiv.2305.04118.

Herold, Christian, and Hermann Ney. 2023. “Improving Long Context Document-Level Machine Translation.” arXiv:2306.05183. Preprint, arXiv, June 8. https://doi.org/10.48550/arXiv.2306.05183.

Lu, Hongyuan, Haoran Yang, Haoyang Huang, Dongdong Zhang, Wai Lam, and Furu Wei. 2024. “Chain-of-Dictionary Prompting Elicits Translation in Large Language Models.” arXiv:2305.06575. Preprint, arXiv, August 17. https://doi.org/10.48550/arXiv.2305.06575.

Sennrich, Rico, Jannis Vamvas, and Alireza Mohammadshahi. 2024. “Mitigating Hallucinations and Off-Target Machine Translation with Source-Contrastive and Language-Contrastive Decoding.” arXiv:2309.07098. Preprint, arXiv, January 29. https://doi.org/10.48550/arXiv.2309.07098.

Wang, Longyue, Chenyang Lyu, Tianbo Ji, et al. 2023. “Document-Level Machine Translation with Large Language Models.” arXiv:2304.02210. Preprint, arXiv, October 24. https://doi.org/10.48550/arXiv.2304.02210.

Zhu, Wenhao, Hongyi Liu, Qingxiu Dong, et al. 2024. “Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis.” arXiv:2304.04675. Preprint, arXiv, June 14. https://doi.org/10.48550/arXiv.2304.04675.

ML4Science

Berner, Julius, Miguel Liu-Schiaffini, Jean Kossaifi, et al. 2025. “Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning.” arXiv:2506.10973. Preprint, arXiv, June 12. https://doi.org/10.48550/arXiv.2506.10973.

Ganga, Sai, and Ziya Uddin. 2024. “Exploring Physics-Informed Neural Networks: From Fundamentals to Applications in Complex Systems.” arXiv:2410.00422. Preprint, arXiv, October 1. https://doi.org/10.48550/arXiv.2410.00422.

Karumuri, Sharmila, Lori Graham-Brady, and Somdatta Goswami. 2025. “Physics-Informed Latent Neural Operator for Real-Time Predictions of Time-Dependent Parametric PDEs.” arXiv:2501.08428. Version 3. Preprint, arXiv, October 28. https://doi.org/10.48550/arXiv.2501.08428.

Lassen, Oskar Bohn, Serio Angelo Maria Agriesti, Filipe Rodrigues, and Francisco Camara Pereira. 2025. “Climate Surrogates for Scalable Multi-Agent Reinforcement Learning: A Case Study with CICERO-SCM.” arXiv:2510.07971. Version 1. Preprint, arXiv, October 9. https://doi.org/10.48550/arXiv.2510.07971.

Lejarza, Fernando, and Michael Baldea. 2022. “DySMHO: Data-Driven Discovery of Governing Equations for Dynamical Systems via Moving Horizon Optimization.” Scientific Reports 12 (1): 11836. https://doi.org/10.1038/s41598-022-13644-w.

Oommen, Vivek, Siavash Khodakarami, Aniruddha Bora, Zhicheng Wang, and George Em Karniadakis. 2025. “Learning Turbulent Flows with Generative Models: Super-Resolution, Forecasting, and Sparse Flow Reconstruction.” arXiv:2509.08752. Preprint, arXiv, September 10. https://doi.org/10.48550/arXiv.2509.08752.

Owens, Katherine, and J. Nathan Kutz. 2022. “Data-Driven Discovery of Governing Equations for Coarse-Grained Heterogeneous Network Dynamics.” arXiv:2205.10965. Preprint, arXiv, May 23. https://doi.org/10.48550/arXiv.2205.10965.

Tauberschmidt, Jan, Sophie Fellenz, Sebastian J. Vollmer, and Andrew B. Duncan. 2025. “Physics-Constrained Fine-Tuning of Flow-Matching Models for Generation and Inverse Problems.” arXiv:2508.09156. Preprint, arXiv, August 5. https://doi.org/10.48550/arXiv.2508.09156.

Toscano, Juan Diego, Vivek Oommen, Alan John Varghese, et al. 2024. “From PINNs to PIKANs: Recent Advances in Physics-Informed Machine Learning.” arXiv:2410.13228. Preprint, arXiv, October 22. https://doi.org/10.48550/arXiv.2410.13228.

You, Wen, Shaoqian Zhou, and Xuhui Meng. 2025. “Self-Supervised Neural Operator for Solving Partial Differential Equations.” arXiv:2509.00867. Version 1. Preprint, arXiv, August 31. https://doi.org/10.48550/arXiv.2509.00867.

Modèle génératifs

Huang, Rongjie, Mingze Li, Dongchao Yang, et al. 2023. “AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.” arXiv:2304.12995. Preprint, arXiv, April 25. https://doi.org/10.48550/arXiv.2304.12995.

Luo, Zhengxiong, Dayou Chen, Yingya Zhang, et al. 2023. “VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation.” arXiv:2303.08320. Preprint, arXiv, October 13. https://doi.org/10.48550/arXiv.2303.08320.

Moser, Brian B., Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio, and Andreas Dengel. 2025. “Diffusion Models, Image Super-Resolution And Everything: A Survey.” IEEE Transactions on Neural Networks and Learning Systems 36 (7): 11793–813. https://doi.org/10.1109/TNNLS.2024.3476671.

Nichol, Alex, Prafulla Dhariwal, Aditya Ramesh, et al. 2022. “GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.” arXiv:2112.10741. Preprint, arXiv, March 8. https://doi.org/10.48550/arXiv.2112.10741.

Saharia, Chitwan, William Chan, Saurabh Saxena, et al. 2022. “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding.” arXiv:2205.11487. Preprint, arXiv, May 23. https://doi.org/10.48550/arXiv.2205.11487.

Sordo, Zineb, Eric Chagnon, and Daniela Ushizima. 2025. “A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images.” arXiv:2502.21151. Version 2. Preprint, arXiv, March 10. https://doi.org/10.48550/arXiv.2502.21151.

Sun, Quan, Qiying Yu, Yufeng Cui, et al. 2023. “Generative Pretraining in Multimodality.” arXiv:2307.05222. Version 1. Preprint, arXiv, July 11. https://doi.org/10.48550/arXiv.2307.05222.

Xu, Katherine, Lingzhi Zhang, and Jianbo Shi. 2025. “Detecting Origin Attribution for Text-to-Image Diffusion Models.” arXiv:2403.19653. Preprint, arXiv, April 16. https://doi.org/10.48550/arXiv.2403.19653.

Zhang, Jinjin, Qiuyu Huang, Junjie Liu, Xiefan Guo, and Di Huang. 2025. “Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models.” arXiv:2503.18352. Version 1. Preprint, arXiv, March 24. https://doi.org/10.48550/arXiv.2503.18352.

Zhang, Lvmin, Anyi Rao, and Maneesh Agrawala. 2023. “Adding Conditional Control to Text-to-Image Diffusion Models.” arXiv:2302.05543. Preprint, arXiv, November 26. https://doi.org/10.48550/arXiv.2302.05543.

Modèles de fondation

Frantar, Elias, Carlos Riquelme, Neil Houlsby, Dan Alistarh, and Utku Evci. 2023. “Scaling Laws for Sparsely-Connected Foundation Models.” arXiv:2309.08520. Preprint, arXiv, September 15. https://doi.org/10.48550/arXiv.2309.08520.

Liu, Fan, Tianshu Zhang, Wenwen Dai, Wenwen Cai, Xiaocong Zhou, and Delong Chen. 2024. “Few-Shot Adaptation of Multi-Modal Foundation Models: A Survey.” arXiv:2401.01736. Preprint, arXiv, January 4. https://doi.org/10.48550/arXiv.2401.01736.

Liu, Xu, Tong Zhou, Yuanxin Wang, et al. 2023. “Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey.” arXiv:2312.10163. Preprint, arXiv, December 15. https://doi.org/10.48550/arXiv.2312.10163.

Lu, Jianglin, Hailing Wang, Yi Xu, Yizhou Wang, Kuo Yang, and Yun Fu. 2025. “Representation Potentials of Foundation Models for Multimodal Alignment: A Survey.” arXiv:2510.05184. Preprint, arXiv, October 5. https://doi.org/10.48550/arXiv.2510.05184.

Schneider, Johannes, Christian Meske, and Pauline Kuss. 2024. “Foundation Models.” Business & Information Systems Engineering 66 (2): 221–31. https://doi.org/10.1007/s12599-024-00851-0.

Subramanian, Shashank, Peter Harrington, Kurt Keutzer, et al. 2023. “Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior.” arXiv:2306.00258. Preprint, arXiv, June 1. https://doi.org/10.48550/arXiv.2306.00258.

Sun, Weigao, Jiaxi Hu, Yucheng Zhou, et al. 2025. “Speed Always Wins: A Survey on Efficient Architectures for Large Language Models.” arXiv:2508.09834. Preprint, arXiv, August 13. https://doi.org/10.48550/arXiv.2508.09834.

Xu, Mengwei, Wangsong Yin, Dongqi Cai, et al. 2024a. “A Survey of Resource-Efficient LLM and Multimodal Foundation Models.” arXiv:2401.08092. Preprint, arXiv, September 23. https://doi.org/10.48550/arXiv.2401.08092.

Xu, Mengwei, Wangsong Yin, Dongqi Cai, et al. 2024b. “A Survey of Resource-Efficient LLM and Multimodal Foundation Models.” arXiv:2401.08092. Preprint, arXiv, September 23. https://doi.org/10.48550/arXiv.2401.08092.

Yuan, Yang. 2024. “On the Power of Foundation Models.” arXiv:2211.16327. Preprint, arXiv, October 22. https://doi.org/10.48550/arXiv.2211.16327.

Musique

Afchar, Darius, Gabriel Meseguer-Brocal, and Romain Hennequin. 2025. “AI-Generated Music Detection and Its Challenges.” arXiv:2501.10111. Version 1. Preprint, arXiv, January 17. https://doi.org/10.48550/arXiv.2501.10111.

Agostinelli, Andrea, Timo I. Denk, Zalán Borsos, et al. 2023. “MusicLM: Generating Music From Text.” arXiv:2301.11325. Preprint, arXiv, January 26. https://doi.org/10.48550/arXiv.2301.11325.

Chen, Yanxu, Linshu Huang, and Tian Gou. 2024. “Applications and Advances of Artificial Intelligence in Music Generation:A Review.” arXiv:2409.03715. Preprint, arXiv, September 3. https://doi.org/10.48550/arXiv.2409.03715.

Copet, Jade, Felix Kreuk, Itai Gat, et al. 2024. “Simple and Controllable Music Generation.” arXiv:2306.05284. Preprint, arXiv, January 30. https://doi.org/10.48550/arXiv.2306.05284.

Evans, Zach, Julian D. Parker, C. J. Carr, Zack Zukowski, Josiah Taylor, and Jordi Pons. 2024. “Long-Form Music Generation with Latent Diffusion.” arXiv:2404.10301. Version 2. Preprint, arXiv, July 29. https://doi.org/10.48550/arXiv.2404.10301.

Huang, Qingqing, Daniel S. Park, Tao Wang, et al. 2023. “Noise2Music: Text-Conditioned Music Generation with Diffusion Models.” arXiv:2302.03917. Preprint, arXiv, March 6. https://doi.org/10.48550/arXiv.2302.03917.

Lam, Max W. Y., Qiao Tian, Tang Li, et al. 2023. “Efficient Neural Music Generation.” arXiv:2305.15719. Preprint, arXiv, May 25. https://doi.org/10.48550/arXiv.2305.15719.

Lehmkuhl, Jonathan, Ábel Ilyés-Kun, Nico Bremes, Cemhan Kaan Özaltan, Frederik Muthers, and Jiayi Yuan. 2025. “Generating Piano Music with Transformers: A Comparative Study of Scale, Data, and Metrics.” arXiv:2511.07268. Preprint, arXiv, November 10. https://doi.org/10.48550/arXiv.2511.07268.

Wu, Shih-Lun, and Yi-Hsuan Yang. 2022. “MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer with One Transformer VAE.” arXiv:2105.04090. Preprint, arXiv, December 19. https://doi.org/10.48550/arXiv.2105.04090.

Yuan, Ruibin, Hanfeng Lin, Yi Wang, et al. 2024. “ChatMusician: Understanding and Generating Music Intrinsically with LLM.” arXiv:2402.16153. Preprint, arXiv, February 25. https://doi.org/10.48550/arXiv.2402.16153.

Sécurité des LLMs

Chang, Amy, Nicholas Conley, Harish Santhanalakshmi Ganesan, and Adam Swanda. 2025. “Death by a Thousand Prompts: Open Model Vulnerability Analysis.” arXiv:2511.03247. Version 1. Preprint, arXiv, November 5. https://doi.org/10.48550/arXiv.2511.03247.

Chen, Sizhe, Arman Zharmagambetov, Saeed Mahloujifar, Kamalika Chaudhuri, and Chuan Guo. 2024. “Aligning LLMs to Be Robust Against Prompt Injection.” arXiv:2410.05451. Version 1. Preprint, arXiv, October 7. https://doi.org/10.48550/arXiv.2410.05451.

Du, Chenghao, Quanfeng Huang, Tingxuan Tang, Zihao Wang, Adwait Nadkarni, and Yue Xiao. 2025. “Measuring the Security of Mobile LLM Agents under Adversarial Prompts from Untrusted Third-Party Channels.” arXiv:2510.27140. Preprint, arXiv, November 6. https://doi.org/10.48550/arXiv.2510.27140.

Jia, Feiran, Tong Wu, Xin Qin, and Anna Squicciarini. 2024. “The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents.” arXiv:2412.16682. Preprint, arXiv, December 21. https://doi.org/10.48550/arXiv.2412.16682.

Kumar, Aounon, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, and Himabindu Lakkaraju. 2025. “Certifying LLM Safety against Adversarial Prompting.” arXiv:2309.02705. Preprint, arXiv, February 4. https://doi.org/10.48550/arXiv.2309.02705.

Peng, Benji, Ziqian Bi, Qian Niu, et al. 2024. “Jailbreaking and Mitigation of Vulnerabilities in Large Language Models.” arXiv:2410.15236. Version 1. Preprint, arXiv, October 20. https://doi.org/10.48550/arXiv.2410.15236.

Shang, Zhengchun, and Wenlan Wei. 2025. “Evolving Security in LLMs: A Study of Jailbreak Attacks and Defenses.” arXiv:2504.02080. Version 1. Preprint, arXiv, April 2. https://doi.org/10.48550/arXiv.2504.02080.

Shi, Jiawen, Zenghui Yuan, Yinuo Liu, et al. 2025. “Optimization-Based Prompt Injection Attack to LLM-as-a-Judge.” arXiv:2403.17710. Preprint, arXiv, August 24. https://doi.org/10.48550/arXiv.2403.17710.

Yi, Sibo, Yule Liu, Zhen Sun, et al. 2024. “Jailbreak Attacks and Defenses Against Large Language Models: A Survey.” arXiv:2407.04295. Preprint, arXiv, August 30. https://doi.org/10.48550/arXiv.2407.04295.

Zhao, Andrew, Reshmi Ghosh, Vitor Carvalho, et al. 2025. “Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-Based Optimizers.” arXiv:2510.14381. Preprint, arXiv, October 16. https://doi.org/10.48550/arXiv.2510.14381.

Segmentation

Catalano, Nico, and Matteo Matteucci. 2024. “Few Shot Semantic Segmentation: A Review of Methodologies, Benchmarks, and Open Challenges.” arXiv:2304.05832. Preprint, arXiv, May 20. https://doi.org/10.48550/arXiv.2304.05832.

Ke, Lei, Mingqiao Ye, Martin Danelljan, et al. 2023. “Segment Anything in High Quality.” arXiv:2306.01567. Preprint, arXiv, October 23. https://doi.org/10.48550/arXiv.2306.01567.

Kirillov, Alexander, Eric Mintun, Nikhila Ravi, et al. 2023. “Segment Anything.” arXiv:2304.02643. Preprint, arXiv, April 5. https://doi.org/10.48550/arXiv.2304.02643.

Li, Feng, Hao Zhang, Peize Sun, et al. 2023. “Semantic-SAM: Segment and Recognize Anything at Any Granularity.” arXiv:2307.04767. Preprint, arXiv, July 10. https://doi.org/10.48550/arXiv.2307.04767.

Li, Feng, Hao Zhang, Huaizhe xu, et al. 2022. “Mask DINO: Towards A Unified Transformer-Based Framework for Object Detection and Segmentation.” arXiv:2206.02777. Preprint, arXiv, December 12. https://doi.org/10.48550/arXiv.2206.02777.

Liu, Xinyu, Beiwen Tian, Zhen Wang, et al. 2023. “Delving Into Shape-Aware Zero-Shot Semantic Segmentation.” 2999–3009. https://openaccess.thecvf.com/content/CVPR2023/html/Liu_Delving_Into_Shape-Aware_Zero-Shot_Semantic_Segmentation_CVPR_2023_paper.html.

Rajič, Frano, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, and Fisher Yu. 2023. “Segment Anything Meets Point Tracking.” arXiv:2307.01197. Preprint, arXiv, December 3. https://doi.org/10.48550/arXiv.2307.01197.

Wang, Xinlong, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, and Tiejun Huang. 2023. “SegGPT: Towards Segmenting Everything in Context.” 1130–40. https://openaccess.thecvf.com/content/ICCV2023/html/Wang_SegGPT_Towards_Segmenting_Everything_in_Context_ICCV_2023_paper.html.

Xu, Jiarui, Shalini De Mello, Sifei Liu, et al. 2022. “GroupViT: Semantic Segmentation Emerges from Text Supervision.” arXiv:2202.11094. Preprint, arXiv, July 18. https://doi.org/10.48550/arXiv.2202.11094.

Xu, Jilan, Junlin Hou, Yuejie Zhang, et al. 2023. “Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.” 2935–44. https://openaccess.thecvf.com/content/CVPR2023/html/Xu_Learning_Open-Vocabulary_Semantic_Segmentation_Models_From_Natural_Language_Supervision_CVPR_2023_paper.html.

Self-supervized learning

Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. “A Simple Framework for Contrastive Learning of Visual Representations.” arXiv:2002.05709. Preprint, arXiv, July 1. https://doi.org/10.48550/arXiv.2002.05709.

Chen, Wenxi, Yuzhe Liang, Ziyang Ma, Zhisheng Zheng, and Xie Chen. 2024. “EAT: Self-Supervised Pre-Training with Efficient Audio Transformer.” arXiv:2401.03497. Preprint, arXiv, January 7. https://doi.org/10.48550/arXiv.2401.03497.

Guo, Huijie, Jingyao Wang, Peizheng Guo, Xingchen Shen, Changwen Zheng, and Wenwen Qiang. 2025. “Exploring Transferability of Self-Supervised Learning by Task Conflict Calibration.” arXiv:2511.13787. Preprint, arXiv, November 16. https://doi.org/10.48550/arXiv.2511.13787.

Hondru, Vlad, Florinel Alin Croitoru, Shervin Minaee, Radu Tudor Ionescu, and Nicu Sebe. 2024. “Masked Image Modeling: A Survey.” arXiv:2408.06687. Version 1. Preprint, arXiv, August 13. https://doi.org/10.48550/arXiv.2408.06687.

Liu, Ziyu, Azadeh Alavi, Minyi Li, and Xiang Zhang. 2024. “Self-Supervised Learning for Time Series: Contrastive or Generative?” arXiv:2403.09809. Version 1. Preprint, arXiv, March 14. https://doi.org/10.48550/arXiv.2403.09809.

Ma, Duo, Xianghu Yue, Junyi Ao, Xiaoxue Gao, and Haizhou Li. 2024. “Text-Guided HuBERT: Self-Supervised Speech Pre-Training via Generative Adversarial Networks.” arXiv:2402.15725. Version 3. Preprint, arXiv, July 22. https://doi.org/10.48550/arXiv.2402.15725.

Naiman, Ilan, Emanuel Ben-Baruch, Oron Anschel, et al. 2025. “LV-MAE: Learning Long Video Representations through Masked-Embedding Autoencoders.” arXiv:2504.03501. Preprint, arXiv, October 7. https://doi.org/10.48550/arXiv.2504.03501.

Shi, Yuge, Imant Daunhawer, Julia E. Vogt, Philip H. S. Torr, and Amartya Sanyal. 2022. “How Robust Is Unsupervised Representation Learning to Distribution Shift?” arXiv:2206.08871. Preprint, arXiv, December 16. https://doi.org/10.48550/arXiv.2206.08871.

Tan, Fuwen, Fatemeh Saleh, and Brais Martinez. 2023. “Effective Self-Supervised Pre-Training on Low-Compute Networks without Distillation.” arXiv:2210.02808. Preprint, arXiv, October 2. https://doi.org/10.48550/arXiv.2210.02808.

Zong, Yongshuo, Oisin Mac Aodha, and Timothy Hospedales. 2024. “Self-Supervised Multimodal Learning: A Survey.” arXiv:2304.01008. Preprint, arXiv, August 16. https://doi.org/10.48550/arXiv.2304.01008.

Speech

Borsos, Zalán, Raphaël Marinier, Damien Vincent, et al. 2023. “AudioLM: A Language Modeling Approach to Audio Generation.” arXiv:2209.03143. Preprint, arXiv, July 26. https://doi.org/10.48550/arXiv.2209.03143.

Chan, William, Daniel Park, Chris Lee, Yu Zhang, Quoc Le, and Mohammad Norouzi. 2021. “SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network.” arXiv:2104.02133. Preprint, arXiv, April 27. https://doi.org/10.48550/arXiv.2104.02133.

Chen, Sanyuan, Chengyi Wang, Zhengyang Chen, et al. 2022. “WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.” IEEE Journal of Selected Topics in Signal Processing 16 (6): 1505–18. https://doi.org/10.1109/JSTSP.2022.3188113.

Cui, Wenqian, Dianzhi Yu, Xiaoqi Jiao, et al. 2025. “Recent Advances in Speech Language Models: A Survey.” arXiv:2410.03751. Preprint, arXiv, August 7. https://doi.org/10.48550/arXiv.2410.03751.

Gulati, Anmol, James Qin, Chung-Cheng Chiu, et al. 2020. “Conformer: Convolution-Augmented Transformer for Speech Recognition.” arXiv:2005.08100. Preprint, arXiv, May 16. https://doi.org/10.48550/arXiv.2005.08100.

Ju, Zeqian, Yuancheng Wang, Kai Shen, et al. 2024. “NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.” arXiv:2403.03100. Preprint, arXiv, April 23. https://doi.org/10.48550/arXiv.2403.03100.

Lu, Yizhou, Mingkun Huang, Xinghua Qu, Pengfei Wei, and Zejun Ma. 2022. “Language Adaptive Cross-Lingual Speech Representation Learning with Sparse Sharing Sub-Networks.” arXiv:2203.04583. Preprint, arXiv, March 9. https://doi.org/10.48550/arXiv.2203.04583.

Mohamed, Abdelrahman, Hung-yi Lee, Lasse Borgholt, et al. 2022. “Self-Supervised Speech Representation Learning: A Review.” IEEE Journal of Selected Topics in Signal Processing 16 (6): 1179–210. https://doi.org/10.1109/JSTSP.2022.3207050.

Radford, Alec, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. “Robust Speech Recognition via Large-Scale Weak Supervision.” arXiv:2212.04356. Preprint, arXiv, December 6. https://doi.org/10.48550/arXiv.2212.04356.