jailbreak(3)
-
[데이터셋리뷰] JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks
Luo, W., Ma, S., Liu, X., Guo, X., & Xiao, C. (2024). JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks. https://arxiv.org/abs/2404.03027 JailBreakV: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak AttacksWith the rapid advancements in Multimodal Large Language Models (MLLMs), securing..
2025.01.06 -
[논문리뷰] "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Shen, X., Chen, Z., Backes, M., Shen, Y., & Zhang, Y. (2023). " do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825.https://arxiv.org/abs/2308.03825 "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language ModelsThe misuse of large language models (LLMs) has drawn significa..
2024.12.23 -
[논문리뷰] PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition, Accepted at ICML 2024, Ziyang Zhang, Qizhen Zhang, Jakob Foerster https://arxiv.org/abs/2405.07932 PARDEN, Can You Repeat That? Defending against Jailbreaks via RepetitionLarge language models (LLMs) have shown success in many natural language processing tasks. Despite rigorous safety alignment processes, supposedly safety..
2024.09.14