728x90 safeguarding1 [논문리뷰] PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition, Accepted at ICML 2024, Ziyang Zhang, Qizhen Zhang, Jakob Foerster https://arxiv.org/abs/2405.07932 PARDEN, Can You Repeat That? Defending against Jailbreaks via RepetitionLarge language models (LLMs) have shown success in many natural language processing tasks. Despite rigorous safety alignment processes, supposedly safety.. 2024. 9. 14. 이전 1 다음 728x90