Training Method
Dataset
Model
- Textbooks Are All You Need: Phi-1
- SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Extra
MoE(Mixture of Experts)
'딥러닝(Deep Learning) > 논문 리뷰' 카테고리의 다른 글
ModernBERT : Smarter, Better, Faster, Longer(더 똑똑하게, 좋게, 빠르게, 길게) (0) | 2024.12.26 |
---|---|
Attention Is All You Need : 아직도 어텐션이 전부야? (0) | 2024.12.10 |
GSM-Symbolic : 애플의 새로운 수학 벤치마크 제안 (3) | 2024.11.09 |