Danny 's picture

Open to Collab

Danny

TheDrunkenSnail

·

AI & ML interests

None yet

Recent Activity

reacted to salma-remyx's post with 🔥 3 days ago

The space of possible improvements for your AI model is large while evaluation is costly. So I was excited to discover the ICML 2026 paper from Kobalczyk, Lin, Letham, Zhao, Balandat, and Bakshy titled "LILO: Bayesian Optimization with Natural Language Feedback." The method learns efficiently from expert preferences, balancing exploration and exploitation in a principled way with Bayesian Optimization for expensive-to-evaluate black-box objectives. Experimenting with the technique, I trained a Gaussian Process proxy model on the implicit preferences in my code repo's commit history at VQASynth. The result: I used the model's preference scores to re-rank candidate papers recommended based on my interests in spatial reasoning and multimodal data synthesis. Semantic relevance is a high-recall method for finding arXiv papers personalized to your interests. Adding contributor preferences, extracted from the merge history of your code offers a high-precision filter. So what's next? I'm using the model to synthesize a larger volume of preference data to finetune an open-weight coding model with DPO and LoRA. Tuning Coding Agents via Implicit Preference Distillation arXiv: https://arxiv.org/pdf/2510.17671 Substack: https://remyxai.substack.com/p/lilo-and-myx VQASynth: https://github.com/remyxai/VQASynth

upvoted an article about 2 months ago

TRL v1.0: Post-Training Library Built to Move with the Field

reacted to DedeProGames's post with 🔥 about 2 months ago

🔥 GRM2 - The small one that surpasses the big ones. What if a 3-parameter model can beat a 32-parameter model in every benchmark? We prove that it can. GRM2 is a 3b params model based on the llama architecture, trained for long reasoning and high performance in complex tasks - the first 3b params model to outperform qwen3-32b in ALL benchmarks, and outperform o3-mini in almost all benchmarks. 🤗 Model: https://huggingface.co/OrionLLM/GRM2-3b The first 3b params model to generate over 1000 lines of code and achieve a score of 39.0 in xBench-DeepSearch-2510. 🚀 Chat with GRM: https://huggingface.co/spaces/DedeProGames/GRM2-Chat 🏆 Download official GGUFs: https://huggingface.co/OrionLLM/GRM2-3b-GGUF

View all activity

Organizations

Collections 2

spaces 3

ChatBot

Model Testing Spae

AutoTrain Advanced

Create powerful AI models without code

Test

models 25

TheDrunkenSnail/Dirty-Muse-Writer-v01-Uncensored-Erotica-NSFW-Q6_K-GGUF

9B • Updated Mar 15, 2025 • 667 • 15

TheDrunkenSnail/Dirty-Muse-Writer-v01-Uncensored-Erotica-NSFW-Q4_K_M-GGUF

9B • Updated Mar 15, 2025 • 371 • 4

TheDrunkenSnail/Mother-of-Rhodia-12B

Text Generation • 12B • Updated Jan 26, 2025 • 9 • 1

TheDrunkenSnail/Mother-of-Rhodia-12B-Q4_K_M-GGUF

12B • Updated Jan 26, 2025 • 1

TheDrunkenSnail/Daughter-of-Rhodia-12B-Q4_K_M-GGUF

12B • Updated Jan 16, 2025 • 5

TheDrunkenSnail/Son-of-Rhodia

Text Generation • 12B • Updated Jan 1, 2025 • 10 • 3

TheDrunkenSnail/Son-of-Rhodia-Q4_K_M-GGUF

12B • Updated Dec 31, 2024 • 1

TheDrunkenSnail/Rhodia-Q4_K_M-GGUF

10B • Updated Dec 31, 2024 • 3

TheDrunkenSnail/Rhodia

Text Generation • 10B • Updated Dec 31, 2024 • 2

TheDrunkenSnail/Sao-cup-of-dead-wasps-12b-v0.1-gguf

12B • Updated Dec 31, 2024 • 23 • 1

datasets 0

None public yet