YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Grid-JEPA: JEPA-Based World Model for ARC-AGI-3

A neural architecture for the ARC-AGI-3 competition combining Joint Embedding Predictive Architecture (JEPA), Recurrent State-Space Models (RSSM), and Test-Time Training (TTT) to solve novel interactive grid-world tasks.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        ARC-AGI-3 Agent                               │
├─────────────────────────────────────────────────────────────────────┤
│  Observation Grid (64x64, 16 colors)                                │
│           ↓                                                        │
│  ┌─────────────┐                                                    │
│  │ Grid-JEPA   │  ← I-JEPA adapted for discrete grid worlds        │
│  │ Encoder     │     1×1 patches, latent-space prediction         │
│  └─────────────┘                                                    │
│           ↓  Latent Representation                                   │
│  ┌─────────────┐                                                    │
│  │    RSSM     │  ← Recurrent State-Space Model (DreamerV3-style)  │
│  │ World Model │     GRU dynamics + discrete latents               │
│  └─────────────┘                                                    │
│           ↓  Hidden State (PERSISTS across levels!)                  │
│  ┌─────────────┐    ┌─────────────┐                                │
│  │  Planning   │ ←→ │ Exploration │                                │
│  │  (Imagination│    │  (Novelty)  │                                │
│  │  Rollouts)  │    │             │                                │
│  └─────────────┘    └─────────────┘                                │
│           ↓                                                        │
│  ┌─────────────┐                                                    │
│  │Goal Inference│  ← Discovers objectives from terminal states   │
│  └─────────────┘                                                    │
│           ↓                                                        │
│  Action (key, position) → Environment                               │
└─────────────────────────────────────────────────────────────────────┘

Key Innovations

1. JEPA for Discrete Grid Worlds

1×1 patch embeddings: Each grid cell is semantically meaningful (colors are categorical)
Latent-space prediction: Predicts transformations (rotate, fill, move) without pixel reconstruction
Action-conditioned predictor: Inspired by Image World Models (Meta, 2024)

2. Persistent World Model State (Critical for ARC-AGI-3)

RSSM state persists across levels within the same environment
Level 3 requires knowledge from Level 1-2; resetting = instant failure

3. Uncertainty-Aware Prediction

Tracks prediction errors over a sliding window
Triggers hypothesis revision when errors are consistently high
Prevents "latching onto early hypothesis" failure mode

4. Test-Time Training (TTT)

Per-task LoRA adapters for each novel environment
Fine-tunes on collected demos with geometric augmentations
Based on TTT for ARC (arXiv:2411.07279) achieving 53% on ARC-AGI-1

Repository Structure

arc-jepa/
├── src/
│   ├── models/
│   │   ├── encoder.py          # GridPatchEmbed + ViT encoders + EMA
│   │   ├── predictor.py        # Action-conditioned predictor
│   │   ├── grid_jepa.py        # Complete Grid-JEPA system
│   │   ├── rssm.py             # Recurrent State-Space Model
│   │   ├── agent.py            # Full ARC agent (JEPA + RSSM + planning)
│   │   └── ttt_adapter.py      # LoRA TTT adapter
│   ├── data/                    # Dataset loaders + augmentations
│   ├── training/                # Training scripts
│   └── utils/                   # Utilities
├── tests/                       # Unit tests
└── README.md                    # This file

Core Components

`agent.py` — Complete Agent (Central Module)

ARCAgent: Full agent loop encoding the core insight of this project
GoalInferenceModule: Discovers objectives from terminal/done states
ExplorationPolicy: Novelty-seeking with undo loop avoidance
PlanningModule: Imagination-based action selection via RSSM rollouts
UncertaintyTracker: Hypothesis revision when predictions fail consistently

`encoder.py` — Grid-JEPA Encoder

GridPatchEmbed: 1×1 patch embeddings for color grids
ViTEncoder: Multi-head attention transformer blocks
EMATargetEncoder: EMA-updated target encoder (prevents collapse)

`predictor.py` — Action-Conditioned Predictor

DiscreteActionEmbed: Embeds (action_key, cell_position) pairs
ActionConditionedPredictor: Predicts target patches from context + action
GridWorldPredictor: Full predictor + decoder to color logits

`rssm.py` — Recurrent State-Space Model

observe(): Update state with new observation (posterior)
imagine(): Predict next state given action (prior)
rollout(): Imagine future trajectories for planning
Straight-through gradients for discrete latents

`ttt_adapter.py` — Test-Time Training

LoRALayer: Low-rank adaptation (W' = W + BA)
PredictorLoRAAdapter: Per-task LoRA on JEPA predictor
TTTTrainer: Fine-tunes on demos with augmentation voting

Key Design Decisions

Decision	Rationale
1×1 patches	Grid cells are semantically meaningful, unlike image pixels
L2 latent loss	Reconstruction forces modeling irrelevant visual details
EMA target encoder	Prevents representation collapse in self-supervised learning
Feature conditioning	Outperforms concatenation for action conditioning
Straight-through latents	Enables gradient flow through discrete RSSM states
State persistence	ARC-AGI-3 levels build on each other
Uncertainty tracking	Prevents getting stuck on wrong hypotheses
LoRA TTT	Efficient per-task adaptation without catastrophic forgetting

Papers

I-JEPA (arXiv:2301.08243) — Foundation of encoder design
Image World Models (arXiv:2403.00504) — Action-conditioned predictor
DreamerV3 (arXiv:2301.04104) — RSSM dynamics architecture
TTT for ARC (arXiv:2411.07279) — Per-task LoRA fine-tuning
ARC-AGI-3 (arXiv:2603.24621) — Competition specification

License

MIT License — Open source as required for ARC Prize eligibility.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for guychuk/arc-agi-3-grid-jepa

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

Paper • 2301.08243 • Published Jan 19, 2023 • 7

Mastering Diverse Domains through World Models

Paper • 2301.04104 • Published Jan 10, 2023