BabyLM Challenge

community

https://babylm.github.io/

babyLMchallenge

AI & ML interests

Pretraining data constrained and cognitively relevant baby LLMs

Recent Activity

siyuansong new activity about 12 hours ago

BabyLM-community/babylm-zho:Update README.md

francois-meyer updated a dataset about 2 months ago

BabyLM-community/babylm-afr

francois-meyer updated a dataset about 2 months ago

BabyLM-community/babylm-xho

View all activity

Papers

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

View all Papers

siyuansong

in BabyLM-community/babylm-zho about 12 hours ago

Update README.md

#1 opened 24 days ago by

francois-meyer

updated 5 datasets about 2 months ago

BabyLM-community/babylm-afr

Viewer • Updated Nov 7, 2025 • 96.6k • 12

BabyLM-community/babylm-xho

Viewer • Updated Nov 7, 2025 • 12.4k • 4

BabyLM-community/babylm-sot

Viewer • Updated Nov 7, 2025 • 19.3k • 3

BabyLM-community/babylm-zul

Viewer • Updated Nov 7, 2025 • 16.1k • 3

BabyLM-community/babylm-nso

Viewer • Updated Nov 7, 2025 • 26.8k • 11

bbunzeck

authored a paper 2 months ago

Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)

Paper • 2510.20358 • Published Oct 23, 2025

suchirsalhan

authored 3 papers 2 months ago

BLiSS 1.0: Evaluating Bilingual Learner Competence in Second Language Small Language Models

Paper • 2510.19419 • Published Oct 22, 2025 • 1

What is the Best Sequence Length for BABYLM?

Paper • 2510.19493 • Published Oct 22, 2025 • 1

Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction

Paper • 2510.20411 • Published Oct 23, 2025 • 2

seyoungsong

authored a paper 2 months ago

Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues

Paper • 2510.19028 • Published Oct 21, 2025 • 7

bbunzeck

authored a paper 3 months ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Paper • 2510.10159 • Published Oct 11, 2025 • 3

suchirsalhan

authored 4 papers 3 months ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Paper • 2510.10159 • Published Oct 11, 2025 • 3

Looking to Learn: Token-wise Dynamic Gating for Low-Resource Vision-Language Modelling

Paper • 2510.08470 • Published Oct 9, 2025 • 1

Pico: A Modular Framework for Hypothesis-Driven Small Language Model Research

Paper • 2509.16413 • Published Sep 19, 2025 • 1

Meta-Pretraining for Zero-Shot Cross-Lingual Named Entity Recognition in Low-Resource Philippine Languages

Paper • 2509.02160 • Published Sep 2, 2025 • 1

juletxara

authored a paper 5 months ago

Instructing Large Language Models for Low-Resource Languages: A Systematic Study for Basque

Paper • 2506.07597 • Published Jun 9, 2025

negar-foroutan

authored a paper 6 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 75

suchirsalhan

authored a paper 6 months ago

ByteSpan: Information-Driven Subword Tokenisation

Paper • 2506.18639 • Published Jun 23, 2025 • 3

juletxara

authored a paper 7 months ago

Lessons from the Trenches on Reproducible Evaluation of Language Models

Paper • 2405.14782 • Published May 23, 2024 • 1