JailBreaking(3)
-
[논문리뷰] White-box Multimodal Jailbreaks Against Large Vision-Language Models
Wang, R., Ma, X., Zhou, H., Ji, C., Ye, G., & Jiang, Y. (2024). White-box Multimodal Jailbreaks Against Large Vision-Language Models. ACM Multimedia.https://arxiv.org/abs/2405.17894 White-box Multimodal Jailbreaks Against Large Vision-Language ModelsRecent advancements in Large Vision-Language Models (VLMs) have underscored their superiority in various multimodal tasks. However, the adversarial ..
2024.12.27 -
[논문리뷰] Visual Adversarial Examples Jailbreak Aligned Large Language Models
Qi, X., Huang, K., Panda, A., Henderson, P., Wang, M., & Mittal, P. (2024). Visual Adversarial Examples Jailbreak Aligned Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 21527-21536. https://doi.org/10.1609/aaai.v38i19.30150 https://arxiv.org/abs/2306.13213 Visual Adversarial Examples Jailbreak Aligned Large Language ModelsRecently, there has been a ..
2024.12.26 -
[논문리뷰] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
Zhexin Zhang, Junxiao Yang, Pei Ke, Fei Mi, Hongning Wang, and Minlie Huang. 2024. Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8865–8887, Bangkok, Thailand. Association for Computational Linguistics. https://aclanthology.org/2024...
2024.12.24