12B GGUFs
Danny
TheDrunkenSnail
AI & ML interests
None yet
Recent Activity
reacted to salma-remyx's post with π₯ 3 days ago
The space of possible improvements for your AI model is large while evaluation is costly.
So I was excited to discover the ICML 2026 paper from Kobalczyk, Lin, Letham, Zhao, Balandat, and Bakshy titled "LILO: Bayesian Optimization with Natural Language Feedback."
The method learns efficiently from expert preferences, balancing exploration and exploitation in a principled way with Bayesian Optimization for expensive-to-evaluate black-box objectives.
Experimenting with the technique, I trained a Gaussian Process proxy model on the implicit preferences in my code repo's commit history at VQASynth.
The result: I used the model's preference scores to re-rank candidate papers recommended based on my interests in spatial reasoning and multimodal data synthesis.
Semantic relevance is a high-recall method for finding arXiv papers personalized to your interests. Adding contributor preferences, extracted from the merge history of your code offers a high-precision filter.
So what's next? I'm using the model to synthesize a larger volume of preference data to finetune an open-weight coding model with DPO and LoRA. Tuning Coding Agents via Implicit Preference Distillation
arXiv: https://arxiv.org/pdf/2510.17671
Substack: https://remyxai.substack.com/p/lilo-and-myx
VQASynth: https://github.com/remyxai/VQASynth upvoted an article about 2 months ago
TRL v1.0: Post-Training Library Built to Move with the Field reacted to DedeProGames's post with π₯ about 2 months ago
π₯ GRM2 - The small one that surpasses the big ones.
What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can.
GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks.
π€ Model: https://huggingface.co/OrionLLM/GRM2-3b
The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510.
π Chat with GRM:
https://huggingface.co/spaces/DedeProGames/GRM2-Chat
π Download official GGUFs: https://huggingface.co/OrionLLM/GRM2-3b-GGUF