Paper Review/Large Language Model (LLM)(16)
-
[논문리뷰] DeepSeek-V3 Technical Report
최근 LLM 분야에서 가장 핫한 주제인 DeepSeek를 V3부터 R1까지, DeepSeek에서 발표한 논문을 기반으로 파헤쳐 보려합니다.https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file GitHub - deepseek-ai/DeepSeek-V3Contribute to deepseek-ai/DeepSeek-V3 development by creating an account on GitHub.github.com그럼 V3부터 바로 리뷰해보도록 할게요!오늘도 파란 글씨는 제 사견입니다.AbstractDeepSeek-V3는 총 671B, 각 토큰당 37B 개의 파라미터가 활성화되는 강력한 Mixture-of-Experts (MoE) 언어모델인 Deep..
2025.02.04 -
[논문리뷰] LoRA Learns Less and Forgets Less
요즘 이런저런 일이 많다보니 놓친 논문리뷰를 다시 해봐야겠죠!오늘은 2024년 8월에 Transactions on Machine Learning Research 저널에 publish된 LoRA Learns Less and Forgets Less 논문에 대해서 리뷰해보려고 해요. 제목에서도 알 수 있듯이 LoRA 방식을 활용하여 학습을 진행할 때의 학습과 망각 성능에 대한 논문이에요.https://arxiv.org/abs/2405.09673 LoRA Learns Less and Forgets LessLow-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA sav..
2025.02.03 -
[논문리뷰] Searching for Best Practices in Retrieval-Augmented Generation
Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, and Xuanjing Huang. 2024. Searching for best practices in retrieval-augmented generation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17716-17736, Miami, Florida, USA. Associatio..
2025.01.01 -
[논문리뷰] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
Zhexin Zhang, Junxiao Yang, Pei Ke, Fei Mi, Hongning Wang, and Minlie Huang. 2024. Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8865–8887, Bangkok, Thailand. Association for Computational Linguistics. https://aclanthology.org/2024...
2024.12.24 -
[논문리뷰] "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models
Shen, X., Chen, Z., Backes, M., Shen, Y., & Zhang, Y. (2023). " do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models. arXiv preprint arXiv:2308.03825.https://arxiv.org/abs/2308.03825 "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language ModelsThe misuse of large language models (LLMs) has drawn significa..
2024.12.23 -
Meta: Adapting Open Source Language Models
이번 블로그 포스트는 논문 리뷰는 아니고, Meta에서 운영하는 블로그 글의 리뷰입니다.LLaMa를 개발한 Meta가 어떻게 Open Source Large Language Models (LLMs)를 활용할 수 있을지,Part1: Methods for adapting large language models,Part2: To fine-tune or not to fine-tune,Part3: How to fine-tune: Focus on effective datasts로 나눠 설명하고 있는데, 이 내용을 좀 간추려보려 해요.논문에 비해서 훨씬 읽기 쉬운 글이니, 처음 LLM 모델을 접할 때 읽으면 좋을 것 같습니다! 먼저 각 part에 대해 요약해서 말하자면, part1에서는 LLM 모델의 활용을 개괄적으..
2024.10.24