728x90 jailbreak2 [논문리뷰] "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models Shen, X., Chen, Z., Backes, M., Shen, Y., & Zhang, Y. (2023). " do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825.https://arxiv.org/abs/2308.03825 "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language ModelsThe misuse of large language models (LLMs) has drawn significa.. 2024. 12. 23. [논문리뷰] PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition, Accepted at ICML 2024, Ziyang Zhang, Qizhen Zhang, Jakob Foerster https://arxiv.org/abs/2405.07932 PARDEN, Can You Repeat That? Defending against Jailbreaks via RepetitionLarge language models (LLMs) have shown success in many natural language processing tasks. Despite rigorous safety alignment processes, supposedly safety.. 2024. 9. 14. 이전 1 다음 728x90