ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence
Paper β’ 2603.24621 β’ Published β’ 2
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A neural architecture for the ARC-AGI-3 competition combining Joint Embedding Predictive Architecture (JEPA), Recurrent State-Space Models (RSSM), and Test-Time Training (TTT) to solve novel interactive grid-world tasks.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ARC-AGI-3 Agent β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Observation Grid (64x64, 16 colors) β
β β β
β βββββββββββββββ β
β β Grid-JEPA β β I-JEPA adapted for discrete grid worlds β
β β Encoder β 1Γ1 patches, latent-space prediction β
β βββββββββββββββ β
β β Latent Representation β
β βββββββββββββββ β
β β RSSM β β Recurrent State-Space Model (DreamerV3-style) β
β β World Model β GRU dynamics + discrete latents β
β βββββββββββββββ β
β β Hidden State (PERSISTS across levels!) β
β βββββββββββββββ βββββββββββββββ β
β β Planning β ββ β Exploration β β
β β (Imaginationβ β (Novelty) β β
β β Rollouts) β β β β
β βββββββββββββββ βββββββββββββββ β
β β β
β βββββββββββββββ β
β βGoal Inferenceβ β Discovers objectives from terminal states β
β βββββββββββββββ β
β β β
β Action (key, position) β Environment β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
arc-jepa/
βββ src/
β βββ models/
β β βββ encoder.py # GridPatchEmbed + ViT encoders + EMA
β β βββ predictor.py # Action-conditioned predictor
β β βββ grid_jepa.py # Complete Grid-JEPA system
β β βββ rssm.py # Recurrent State-Space Model
β β βββ agent.py # Full ARC agent (JEPA + RSSM + planning)
β β βββ ttt_adapter.py # LoRA TTT adapter
β βββ data/ # Dataset loaders + augmentations
β βββ training/ # Training scripts
β βββ utils/ # Utilities
βββ tests/ # Unit tests
βββ README.md # This file
agent.py β Complete Agent (Central Module)
ARCAgent: Full agent loop encoding the core insight of this projectGoalInferenceModule: Discovers objectives from terminal/done statesExplorationPolicy: Novelty-seeking with undo loop avoidancePlanningModule: Imagination-based action selection via RSSM rolloutsUncertaintyTracker: Hypothesis revision when predictions fail consistentlyencoder.py β Grid-JEPA Encoder
GridPatchEmbed: 1Γ1 patch embeddings for color gridsViTEncoder: Multi-head attention transformer blocksEMATargetEncoder: EMA-updated target encoder (prevents collapse)predictor.py β Action-Conditioned Predictor
DiscreteActionEmbed: Embeds (action_key, cell_position) pairsActionConditionedPredictor: Predicts target patches from context + actionGridWorldPredictor: Full predictor + decoder to color logitsrssm.py β Recurrent State-Space Model
observe(): Update state with new observation (posterior)imagine(): Predict next state given action (prior)rollout(): Imagine future trajectories for planningttt_adapter.py β Test-Time Training
LoRALayer: Low-rank adaptation (W' = W + BA)PredictorLoRAAdapter: Per-task LoRA on JEPA predictorTTTTrainer: Fine-tunes on demos with augmentation voting| Decision | Rationale |
|---|---|
| 1Γ1 patches | Grid cells are semantically meaningful, unlike image pixels |
| L2 latent loss | Reconstruction forces modeling irrelevant visual details |
| EMA target encoder | Prevents representation collapse in self-supervised learning |
| Feature conditioning | Outperforms concatenation for action conditioning |
| Straight-through latents | Enables gradient flow through discrete RSSM states |
| State persistence | ARC-AGI-3 levels build on each other |
| Uncertainty tracking | Prevents getting stuck on wrong hypotheses |
| LoRA TTT | Efficient per-task adaptation without catastrophic forgetting |
MIT License β Open source as required for ARC Prize eligibility.