Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published 11 days ago • 60
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published Oct 6, 2025 • 501
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain Paper • 2509.26507 • Published Sep 30, 2025 • 538
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth Paper • 2509.03867 • Published Sep 4, 2025 • 210
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Paper • 2509.09372 • Published Sep 11, 2025 • 243
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing Paper • 2509.08721 • Published Sep 10, 2025 • 660
BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K Text Generation • 3B • Updated Jun 25, 2025 • 37 • 1
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published Aug 22, 2025 • 160
view article Article CPU Optimized Embeddings with 🤗 Optimum Intel and fastRAG +4 Mar 15, 2024 • 14
Learning to Skip the Middle Layers of Transformers Paper • 2506.21103 • Published Jun 26, 2025 • 18
view article Article BM25 for Python: Achieving high performance while simplifying dependencies with *BM25S*⚡ Jul 9, 2024 • 75
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6, 2025 • 188
Medical SAM 2: Segment medical images as video via Segment Anything Model 2 Paper • 2408.00874 • Published Aug 1, 2024 • 52