Methods on RL refinement training

#3
by anthony8lee - opened

Great job on the improvement. I am curious if there is a preprint on arXiv or a post about your methods on the RL refinement process. How did you do the RL training and what was the inspiration that led you to refining in this way?

Thanks, I am a starter and trying to make things work but with babysteps. I will trying to answer your question tonight, I am really very knew to this this.

I appreciate it! Same here, I am new to HF as a whole framework and platform so I'm trying to learn from you.

lol bad idea, I had good ideas in the beginning of januari but then sudden block. I have deleted a script this morning where my API was public, Rabbitbot helped me fix the script. It was in Github. But I was busy with too much things at the same time, overload. I was at a point that I did things that I didn't understood but it worked haha. I'll keep you posted.

Sign up or log in to comment