Pretrained models from the paper "Predicting the Order of Upcoming Tokens Improves Language Modeling"
Zayd Muhammad Kawakibi Zuhri PRO
zaydzuhri
AI & ML interests
I really like watching loss go down
Recent Activity
updated
a model
about 10 hours ago
zaydzuhri/dsmtp-code-7B-4096-batch8x2-steps40000-20260101-092051
published
a model
about 10 hours ago
zaydzuhri/dsmtp-code-7B-4096-batch8x2-steps40000-20260101-092051
updated
a model
about 11 hours ago
zaydzuhri/dsmtp-code-1B-4096-batch16x1-steps40000-20251231-083610
Organizations
None yet