A physical commonsense reasoning benchmark for 100+ languages, written in collaboration with 300+ researchers from 65 countries.
Catherine Arnett
catherinearnett
AI & ML interests
multilingual NLP, tokenization
Recent Activity
liked
a dataset
2 days ago
commoncrawl/CommonLID
updated
a dataset
2 days ago
catherinearnett/bilingual-tokenizer-training-data
published
a dataset
3 days ago
catherinearnett/bilingual-tokenizer-training-data