What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models Paper • 2601.06165 • Published Jan 7 • 16
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models Paper • 2512.22238 • Published Dec 23, 2025 • 30
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated 3 days ago • 138
view article Article We Got Claude to Fine-Tune an Open Source LLM burtenshaw, evalstate • Dec 4, 2025 • 624
view article Article A Review on the Evolvement of Load Balancing Strategy in MoE LLMs: Pitfalls and Lessons NormalUhr • Feb 4, 2025 • 35
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 229
view article Article The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix codelion • Nov 3, 2025 • 65
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 132
KORMo pretraining datasets Collection The pretraining datasets for KORMo-10B were collected from diverse, publicly available source. • 14 items • Updated Oct 13, 2025 • 22
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought Paper • 2510.04230 • Published Oct 5, 2025 • 27
view article Article Introducing RTEB: A New Standard for Retrieval Evaluation +4 fzliu, KennethEnevoldsen, Samoed, isaacchung, tomaarsen, fzoll • Oct 1, 2025 • 143
view article Article mmBERT: ModernBERT goes Multilingual +4 mmarone, orionweller, will-fleshman, eugene-yang, dlawrie, vandurme • Sep 9, 2025 • 146
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 smohammadi, siro1, winglian, marcsun13, djsaunde • Aug 8, 2025 • 98
AI2 Safety Toolkit Collection Safety data, moderation tools and safe LLMs. • 6 items • Updated Dec 23, 2025 • 9
Essential-Web v1.0: 24T tokens of organized web data Paper • 2506.14111 • Published Jun 17, 2025 • 46
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions Paper • 2506.00421 • Published May 31, 2025 • 5