can you update and add the new models
Many models have been around, like GLM 5, Kimi 2.5, and Claude 4.6
Can you add all the new models
Thanks Shadi for the recommendation, we will benchmark them soon isa
really interesting results 😆
gpt-4.1 still 🤯 but GLM 🤔
4.1 is up only because it is the Judge
Can I ask why 4.1 is the judge?
What will the results be if it is removed and replaced with a newer model?
especially that Opanai announced the retirement of multiple models, including 4.1
https://openai.com/index/retiring-gpt-4o-and-older-models/
why 4.1 is the judge?
- It was the best at that time
What will the results be if it is removed?
- The comparison will not be fair for new models since the one who judged the old models is different from the one who judged the new models
If 4.1 is acting as the judge, shouldn’t it be excluded from the tier list to avoid bias?
Also, I think it should be evaluated by a separate judge.
Yes since GPT 4.1 is irrelevant now + it is the judge, we will think about removing it from the list