본문 바로가기

728x90

goal prioritization1

[논문리뷰] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization Zhexin Zhang, Junxiao Yang, Pei Ke, Fei Mi, Hongning Wang, and Minlie Huang. 2024. Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8865–8887, Bangkok, Thailand. Association for Computational Linguistics. https://aclanthology.org/2024... 2024. 12. 24.

이전 1 다음

728x90

티스토리툴바