논문 리뷰(3)
-
[데이터셋리뷰] JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
Luo, W., Ma, S., Liu, X., Guo, X., & Xiao, C. (2024). JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks. https://arxiv.org/abs/2404.03027 JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak AttacksWith the rapid advancements in Multimodal Large Language Models (MLLMs), securing..
2025.01.06 -
[논문리뷰] "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Shen, X., Chen, Z., Backes, M., Shen, Y., & Zhang, Y. (2023). " do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825.https://arxiv.org/abs/2308.03825 "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language ModelsThe misuse of large language models (LLMs) has drawn significa..
2024.12.23 -
[논문리뷰] Direct Preference Optimization: Your Language Model is Secretly a Reward Model (DPO)
Direct Preference Optimization: Your Language Model is Secretly a Reward Model, Advances in Neural Information Processing Systems (Neurips,'24). Rafailov, R., Sharma, A., Mitchell, E., Manning, C. D., Ermon, S., & Finn, C.https://arxiv.org/abs/2305.18290 Direct Preference Optimization: Your Language Model is Secretly a Reward ModelWhile large-scale unsupervised language models (LMs) learn broad ..
2024.09.26