Running on CPU Upgrade Featured 2.81k The Smol Training Playbook 📚 2.81k The secrets to building world-class LLMs
STRICT: Stress Test of Rendering Images Containing Text Paper • 2505.18985 • Published May 25, 2025
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation Paper • 2406.07529 • Published Jun 11, 2024
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 221
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 221
Context Clues Collection Models from the paper Context Clues • 16 items • Updated Nov 7, 2025 • 8
view article Article Finally, a Replacement for BERT: Introducing ModernBERT +13 Dec 19, 2024 • 717
Qwen2 Collection Qwen2 language models, including pretrained and instruction-tuned models of 5 sizes, including 0.5B, 1.5B, 7B, 57B-A14B, and 72B. • 39 items • Updated 8 days ago • 374