view article Article Multimodal Embedding & Reranker Models with Sentence Transformers 26 days ago • 57
OlmPool Collection Collection of models from the paper "Cracks in the Foundation: Seemingly Minor Architectural Choices Impact Long Context Extension". • 26 items • Updated 5 days ago • 2
Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published 6 days ago • 36
Why Fine-Tuning Encourages Hallucinations and How to Fix It Paper • 2604.15574 • Published 19 days ago • 21
Olmo 3.1 Collection The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets... • 9 items • Updated Dec 23, 2025 • 51
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora Paper • 2604.24819 • Published 8 days ago • 85
Laguna XS.2 Collection Designed for agentic coding and long-horizon work on a local machine. Apache 2.0. • 4 items • Updated 7 days ago • 18
Parakeet ASR Collection NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 16 items • Updated 15 days ago • 70
BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation Paper • 2604.09497 • Published 25 days ago • 29
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems Paper • 2604.14228 • Published 21 days ago • 25
Cross-Tokenizer LLM Distillation through a Byte-Level Interface Paper • 2604.07466 • Published 22 days ago • 6
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data Paper • 2604.14164 • Published Mar 23 • 35
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens Paper • 2603.23516 • Published Mar 6 • 49
SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published Mar 25 • 30
On Data Engineering for Scaling LLM Terminal Capabilities Paper • 2602.21193 • Published Feb 24 • 102