'JailBreaking' 태그의 글 목록

[논문리뷰] White-box Multimodal Jailbreaks Against Large Vision-Language Models

Wang, R., Ma, X., Zhou, H., Ji, C., Ye, G., & Jiang, Y. (2024). White-box Multimodal Jailbreaks Against Large Vision-Language Models. ACM Multimedia.https://arxiv.org/abs/2405.17894 White-box Multimodal Jailbreaks Against Large Vision-Language ModelsRecent advancements in Large Vision-Language Models (VLMs) have underscored their superiority in various multimodal tasks. However, the adversarial ..

2024.12.27

[논문리뷰] Visual Adversarial Examples Jailbreak Aligned Large Language Models

Qi, X., Huang, K., Panda, A., Henderson, P., Wang, M., & Mittal, P. (2024). Visual Adversarial Examples Jailbreak Aligned Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 21527-21536. https://doi.org/10.1609/aaai.v38i19.30150 https://arxiv.org/abs/2306.13213 Visual Adversarial Examples Jailbreak Aligned Large Language ModelsRecently, there has been a ..

2024.12.26

[논문리뷰] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Zhexin Zhang, Junxiao Yang, Pei Ke, Fei Mi, Hongning Wang, and Minlie Huang. 2024. Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8865–8887, Bangkok, Thailand. Association for Computational Linguistics. https://aclanthology.org/2024...

2024.12.24

당근과 토마토

당근과 토마토

태그

최근글

댓글

공지사항

아카이브

JailBreaking(3)

티스토리툴바