prep release

Browse files

Files changed (7) hide show

README.md +176 -0
make_datasets.sh +47 -0
s23dr_2026_example/cache_scenes.py +121 -17
s23dr_2026_example/make_sampled_cache.py +102 -18
submitted_2048/README.md +35 -0
submitted_2048/args.json +67 -0
submitted_2048/checkpoint.pt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,176 @@

+---
+license: cc-by-nc-4.0
+library_name: pytorch
+tags:
+  - 3d-reconstruction
+  - wireframe
+  - building
+  - point-cloud
+  - s23dr
+  - cvpr-2026
+datasets:
+  - usm3d/s23dr-2026-sampled_4096_v2
+  - usm3d/s23dr-2026-sampled_2048_v2
+metrics:
+  - HSS
+pipeline_tag: other
+---
+# S23DR 2026 Learned Baseline
+A learned baseline for the **S23DR 2026** challenge (**S**tructured and **S**emantic **3D R**econstruction, or S^2 3DR), part of the [USM3D workshop](https://usm3d.github.io) at CVPR 2026. The model takes a fused point cloud of a building and predicts its wireframe as a set of 3D line segments.
+**Headline result: HSS = 0.382** on the 1024-sample validation set (shipped checkpoint).
+For context, the handcrafted baseline scores HSS = 0.307 on the same split.
+## Quick start
+Run the submission pipeline directly (matches the competition eval harness):
+```bash
+python script.py
+```
+That loads `checkpoint.pt`, fuses the input views into a 4096-point cloud, runs the model, and writes the predicted wireframe for each scene.
+To reproduce the checkpoint from scratch on a single RTX 4090 (~3 hours):
+```bash
+bash reproduce.sh
+```
+Or for a bit-identical deterministic run (~5.5 hours, slower because it disables `torch.compile`):
+```bash
+bash reproduce_deterministic.sh
+```
+Both scripts run the full three-stage recipe described below. See `REPRODUCE.md` for the exact hyperparameters and the reproducibility notes.
+## Architecture
+A Perceiver-style transformer that ingests the point cloud as a sequence of per-point tokens and decodes a fixed set of 3D line segments through cross-attention into a latent.
+```
+Perceiver: hidden=256, ff=1024
+  latent_tokens=256, latent_layers=7
+  encoder_layers=4, decoder_layers=3, cross_attn_interval=4
+  num_heads=4, kv_heads_cross=2, kv_heads_self=2
+  qk_norm=L2, rms_norm=True, dropout=0.1
+  segments=64, segment_param=midpoint_dir_len, segment_conf=True
+  behind_emb_dim=8, vote_features=True, activation=gelu
+```
+The decoder predicts 64 candidate segments, each parametrized as midpoint + direction + length with a confidence head. Training uses a Sinkhorn optimal-transport loss to match predicted segments to ground-truth, plus a symmetric endpoint L1 term in the cooldown stage.
+All architecture and optimizer settings live in `configs/base.json`.
+## Training recipe
+The model ships with a three-stage recipe. Each stage starts from the previous stage's final checkpoint.
+| Stage | Input | Steps | LR | Batch | Notes | HSS |
+|---|---|---|---|---|---|---|
+| 1. 2048 from scratch | 2048 pts | 0 -> 125k | 3e-4, warmup 10k | 32 | Random init, sinkhorn only | 0.281 |
+| 2. 4096 finetune | 4096 pts | 125k -> 135k | 3e-5 constant | 64 | Gentle LR preserves representations | 0.351 |
+| 3. Endpoint cooldown | 4096 pts | 135k -> 170k | 3e-5 then linear decay | 64 | Adds endpoint L1 loss, tightens vertices | **0.382** |
+**Why 2048 first:** training directly on 4096 overfits (1.47x train/val ratio vs 1.19x for 2048). Starting on 2048 produces better-generalized representations that the 4096 finetune can then specialize.
+**Why a gentle LR on finetune:** LR > 1e-4 causes catastrophic forgetting of the 2048 geometry understanding.
+**Why endpoint loss only in stage 3:** the Sinkhorn loss operates on the midpoint/direction/length parametrization and doesn't directly penalize vertex position error. Adding a symmetric endpoint L1 against the detached Sinkhorn assignment tightens vertex precision in the cooldown.
+Full details, including the "what does not work" list (BuildingWorld pretraining, mixed training, high dropout, etc.), are in `REPRODUCE.md`.
+## Evaluation
+> **About the numbers:** all val scores below are HSS at confidence threshold 0.7, averaged over the 1024-sample *internal validation split* we hold out from the published training data (`usm3d/s23dr-2026-sampled_{2048,4096}_v2:validation`). They are **not** test-set numbers. The only test-set number we have is the public leaderboard score of the older 2048 submission (see `submitted_2048/` and the last table below).
+>
+> All numbers below are freshly measured in this release against the checkpoints in this repo.
+### Shipped model and reproductions
+| Model | Checkpoint | HSS @ 4096 | HSS @ 2048 |
+|---|---|---|---|
+| Handcrafted baseline | — | 0.307 | — |
+| **Current release (shipped)** | `checkpoint.pt` | **0.3819** | 0.3734 |
+| Closest compiled E2E repro (#4) | `repro_runs/e2e_repro4_hss379/` | 0.3736 | 0.3675 |
+| Best compiled repro from this codebase | `repro_runs/compiled_repro_hss376/` | 0.3757 | 0.3670 |
+| Deterministic E2E repro (bit-reproducible) | `repro_runs/deterministic_hss372/` | 0.3716 | 0.3665 |
+All repros use the exact 3-stage recipe on a single RTX 4090. The shipped `checkpoint.pt` was trained on the same recipe before this release branch was cut; the ~0.005-0.010 HSS gap between shipped and repros is compiled-mode run-to-run variance (see the Reproducibility section).
+### Training progression (deterministic repro, all stages measured fresh)
+| Stage | Steps | HSS @ 4096 | HSS @ 2048 |
+|---|---|---|---|
+| 1. 2048 from-scratch | 125k | 0.2755 | 0.2812 |
+| 2. 4096 finetune | 135k | 0.3557 | 0.3510 |
+| 3. Endpoint cooldown | 170k | 0.3716 | 0.3665 |
+The stage 1 -> stage 2 jump (+0.08 HSS on 4096) is the biggest single improvement and motivates the 2048 -> 4096 transfer. Stage 3 (endpoint cooldown) adds another +0.016. Note how stage 1 is slightly better at 2048 than at 4096 (because it was only trained on 2048), while stages 2 and 3 invert that ordering after being finetuned on 4096.
+### Previously submitted model (2048, single-stage)
+The `submitted_2048/` directory holds the checkpoint we actually sent to the public leaderboard. It was trained in a single stage on 2048-point data and is a direct ancestor of the current release.
+| Split | Metric | Score |
+|---|---|---|
+| **Public leaderboard (test)** | **HSS** | **0.427** |
+| Internal val @ 2048 | HSS | 0.3692 |
+| Internal val @ 4096 | HSS | 0.3665 |
+We do not have a test number for the current release, but the val-to-test gap observed on this 2048 submission was about **+0.06 HSS** (0.37 val -> 0.43 test). A similar gap on the current `checkpoint.pt` (0.382 val) would suggest a test score in the low 0.44s, though this is extrapolation and unverified.
+## Reproducibility
+| Test | Result |
+|---|---|
+| Forward pass (same ckpt, same input) | bit-identical (0.00 diff) |
+| Deterministic mode, 3 independent runs | bit-identical (162 tensors, max_diff=0.0) |
+| Step 3 from same stage-2 ckpt (2 runs) | HSS=0.382, 0.384 |
+| Compiled-mode E2E variance across runs | ~0.03 HSS (Triton kernel nondeterminism) |
+`reproduce_deterministic.sh` produces byte-identical weights across runs with the same seed, at the cost of ~2x slower training (no `torch.compile`). Compiled mode has small run-to-run variance from Triton kernel selection that grows through chaotic SGD dynamics; E2E compiled repros land in the 0.349-0.379 range.
+A subtle iteration-order effect: the shipped `bad_samples.txt` has 156 non-empty entries (the file lacks a trailing newline so `wc -l` reports 155). Two additional bad samples were discovered after training - they are legitimately bad GT but adding them changes the batch iteration order and costs ~0.005 HSS in deterministic mode and ~0.04 in compiled mode. See the "Reproducibility Notes" section of `REPRODUCE.md` for the full story.
+## Repository layout
+```
+checkpoint.pt                    shipped HSS=0.382 model (step 170000), 4096-point input
+script.py                        competition inference entry point (uses checkpoint.pt)
+s23dr_2026_example/              training package (model, data, train loop, losses)
+configs/base.json                shared training config
+reproduce.sh                     compiled-mode E2E reproduction (~3 hr)
+reproduce_deterministic.sh       bit-reproducible E2E reproduction (~5.5 hr)
+REPRODUCE.md                     detailed recipe, results, ablations, notes
+submitted_2048/                  the model we actually sent to the public leaderboard (HSS_test=0.427)
+    checkpoint.pt                single-stage 2048 model (step 160000)
+    args.json                    full training args
+    README.md                    training details and val/test scores
+repro_runs/                      evidence that the 3-stage recipe reproduces
+    e2e_repro4_hss379/           closest compiled E2E repro (val HSS=0.374)
+    compiled_repro_hss376/       best compiled repro from this codebase (val HSS=0.376)
+    deterministic_hss372/        bit-reproducible deterministic repro (val HSS=0.372)
+```
+Each directory under `repro_runs/` contains the three stage-final checkpoints (125k / 135k / 170k) plus their `args.json`, so a participant can resume from any stage. Note the directory names carry the score at the time the directory was created, which may differ by ~0.002 from fresh evals in the table above due to random variation in post-processing and CUDA kernel selection.
+## Related branches
+- `main` - this release
+- `best-4096-transfer` - working branch with full commit history and internal dev notes
+- `validation-archive` - cold archive of all validation runs (logs, final checkpoints, args) used to verify the release
+## License
+**CC-BY-NC 4.0.** The model weights and code in this repository are released under the Creative Commons Attribution-NonCommercial 4.0 International license. You are free to use, share, and adapt this work for **non-commercial** purposes, provided you give appropriate **attribution**. The training and validation datasets (`usm3d/s23dr-2026-sampled_*`) have their own terms - see the S23DR 2026 competition page for details.
+## Acknowledgements
+This checkpoint is released as a public learned baseline for participants of the **S23DR 2026** challenge, part of the [USM3D workshop](https://usm3d.github.io) at CVPR 2026.

make_datasets.sh ADDED Viewed

	@@ -0,0 +1,47 @@

+#!/bin/bash
+# Rebuild the sampled datasets from scratch, starting from the public raw
+# `usm3d/hoho22k_2026_trainval` dataset. Two stages:
+#
+#   1. cache_scenes.py       : stream raw shards -> per-scene .pt files
+#                              (runs point fusion + priority grouping)
+#   2. make_sampled_cache.py : per-scene .pt -> fixed-size .npz files
+#                              (priority samples to seq_len=2048 or 4096)
+#
+# This reproduces the content of
+#   hf://usm3d/s23dr-2026-sampled_2048_v2
+#   hf://usm3d/s23dr-2026-sampled_4096_v2
+# without needing the intermediate (private) cached_full_pcd dataset.
+#
+# ~3-4 hr on a workstation for the full train+val set (network-bound in stage 1).
+set -e
+OUT_ROOT="${1:-cache}"
+FULL_TRAIN="$OUT_ROOT/full/train"
+FULL_VAL="$OUT_ROOT/full/validation"
+# ----- Stage 1: raw -> per-scene .pt -----
+echo "=== Stage 1: caching train scenes from raw tars ==="
+python -m s23dr_2026_example.cache_scenes --out-dir "$FULL_TRAIN" --split train --skip-existing
+echo "=== Stage 1: caching validation scenes from raw tars ==="
+python -m s23dr_2026_example.cache_scenes --out-dir "$FULL_VAL" --split validation --skip-existing
+# ----- Stage 2: .pt -> sampled .npz -----
+for split in train validation; do
+    for seq_len in 2048 4096; do
+        in_dir="$OUT_ROOT/full/$split"
+        out_dir="$OUT_ROOT/sampled_${seq_len}/$split"
+        echo "=== Stage 2: sampling $split at seq_len=$seq_len ==="
+        python -m s23dr_2026_example.make_sampled_cache \
+            --in-dir "$in_dir" --out-dir "$out_dir" --seq-len "$seq_len"
+    done
+done
+echo ""
+echo "All done. Sampled datasets are at:"
+echo "  $OUT_ROOT/sampled_2048/{train,validation}"
+echo "  $OUT_ROOT/sampled_4096/{train,validation}"
+echo ""
+echo "To train from these, point reproduce.sh at them via"
+echo "  --cache-dir \"\$OUT_ROOT/sampled_2048/train\" (and similar for val/4096)"
+echo "instead of the default hf:// URLs."

s23dr_2026_example/cache_scenes.py CHANGED Viewed

@@ -1,31 +1,42 @@
 #!/usr/bin/env python3
 """Cache compact scenes from HoHo22k shards to training-ready .pt files.
-Runs build_compact_scene + precomputes group_id, semantic class, and
-normalization so training only needs fast sampling + GPU forward.
 Usage:
-  python cache_scenes.py --data-dir data/ --out-dir cache/train
-  python cache_scenes.py --streaming --out-dir cache/train --limit 5000
-  python cache_scenes.py --data-dir data/ --out-dir cache/train --workers 4
-Cache format per file (.pt):
-  xyz:          float32 [P, 3]   all points in world space
-  source:       uint8   [P]      0=colmap, 1=depth
-  group_id:     int8    [P]      priority tier 0-4, -1=excluded
-  class_id:     uint8   [P]      one-hot class index (0-12), see SEMANTIC_CLASSES
-  visible_src:  uint8   [P]      for visualization (1=gestalt, 2=ade)
-  visible_id:   int16   [P]      for visualization (class id within space)
-  center:       float32 [3]      smart normalization center
-  scale:        float32 scalar   smart normalization scale
-  gt_vertices:  float32 [V, 3]   ground truth wireframe vertices
-  gt_edges:     int32   [E, 2]   ground truth wireframe edge indices
 """
 from __future__ import annotations
 import numpy as np
 from .point_fusion import (
     GEST_ID_TO_NAME, ADE_ID_TO_NAME, NUM_GEST,
 )
@@ -176,3 +187,96 @@ def _compute_smart_center_scale(xyz, source, mad_k=2.5, percentile=95.0,
     return center.astype(np.float32), np.float32(scale)

 #!/usr/bin/env python3
 """Cache compact scenes from HoHo22k shards to training-ready .pt files.
+Streams samples from the public `usm3d/hoho22k_2026_trainval` dataset, runs
+`build_compact_scene` (see point_fusion.py), precomputes priority group_id
+and semantic class_id, and saves one .pt per scene.
+Stage 1 of the dataset pipeline. See make_sampled_cache.py for stage 2.
 Usage:
+  python -m s23dr_2026_example.cache_scenes --out-dir cache/full --split train
+  python -m s23dr_2026_example.cache_scenes --out-dir cache/full_val --split validation
+Cache format per .pt file:
+  xyz:             float32 [P, 3]   all points in world space
+  source:          uint8   [P]      0=colmap, 1=depth
+  group_id:        int8    [P]      priority tier 0-4, -1=excluded
+  class_id:        uint8   [P]      one-hot class index (0-12)
+  behind_gest_id:  int16   [P]      behind-gestalt id (-1 if none)
+  visible_src:     uint8   [P]      1=gestalt, 2=ade
+  visible_id:      int16   [P]      class id within space
+  n_views_voted:   uint8   [P]      number of views that voted
+  vote_frac:       float32 [P]      fraction of votes
+  center:          float32 [3]      smart normalization center
+  scale:           float32 scalar   smart normalization scale
+  gt_vertices:     float32 [V, 3]   ground truth wireframe vertices
+  gt_edges:        int32   [E, 2]   ground truth wireframe edge indices
 """
 from __future__ import annotations
+import argparse
+import time
+from pathlib import Path
 import numpy as np
+import torch
 from .point_fusion import (
+    FuserConfig, build_compact_scene,
     GEST_ID_TO_NAME, ADE_ID_TO_NAME, NUM_GEST,
 )
     return center.astype(np.float32), np.float32(scale)
+# ---------------------------------------------------------------------------
+# Dataset pipeline stage 1: raw HF sample -> cached .pt
+# ---------------------------------------------------------------------------
+def _process_one(sample, cfg):
+    """Fuse a single HF sample into a cache dict. Returns (order_id, dict) or None."""
+    rng = np.random.RandomState()
+    n_edges = len(sample.get("wf_edges", []))
+    if n_edges == 0 or n_edges > 64:
+        return None
+    scene = build_compact_scene(sample, cfg, rng=rng)
+    if scene is None:
+        return None
+    gt_v = scene.get("gt_vertices")
+    gt_e = scene.get("gt_edges")
+    if gt_v is None or gt_e is None or len(gt_e) == 0:
+        return None
+    xyz = scene["xyz"]
+    source = scene["source"]
+    group_id, class_id = _compute_group_and_class(
+        scene["visible_src"], scene["visible_id"], scene["behind_gest_id"], source)
+    center, scale = _compute_smart_center_scale(xyz, source)
+    gt_edge_classes = np.asarray(sample["wf_classifications"], dtype=np.int64)
+    return sample["order_id"], {
+        "xyz": xyz.astype(np.float32),
+        "source": source.astype(np.uint8),
+        "group_id": group_id,
+        "class_id": class_id,
+        "behind_gest_id": scene["behind_gest_id"].astype(np.int16),
+        "visible_src": scene["visible_src"].astype(np.uint8),
+        "visible_id": scene["visible_id"].astype(np.int16),
+        "n_views_voted": scene["n_views_voted"],
+        "vote_frac": scene["vote_frac"],
+        "center": center,
+        "scale": scale,
+        "gt_vertices": gt_v.astype(np.float32),
+        "gt_edges": gt_e.astype(np.int32),
+        "gt_edge_classes": gt_edge_classes,
+    }
+def main():
+    p = argparse.ArgumentParser(description="Stage 1: HoHo22k -> cached .pt files")
+    p.add_argument("--out-dir", required=True, help="Output directory for .pt files")
+    p.add_argument("--split", default="train", choices=["train", "validation"])
+    p.add_argument("--limit", type=int, default=0, help="Stop after N samples (0 = all)")
+    p.add_argument("--depth-per-view", type=int, default=8000)
+    p.add_argument("--skip-existing", action="store_true")
+    args = p.parse_args()
+    out_dir = Path(args.out_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    existing = {p.stem for p in out_dir.glob("*.pt")} if args.skip_existing else set()
+    from datasets import load_dataset
+    print(f"Streaming usm3d/hoho22k_2026_trainval split={args.split}...")
+    ds = load_dataset("usm3d/hoho22k_2026_trainval",
+                      streaming=True, trust_remote_code=True, split=args.split)
+    cfg = FuserConfig(depth_points_per_view=args.depth_per_view)
+    saved, skipped = 0, 0
+    t0 = time.perf_counter()
+    for i, sample in enumerate(ds):
+        if args.limit > 0 and i >= args.limit:
+            break
+        oid = sample["order_id"]
+        if oid in existing:
+            skipped += 1
+            continue
+        result = _process_one(sample, cfg)
+        if result is None:
+            skipped += 1
+            continue
+        order_id, data = result
+        torch.save(data, out_dir / f"{order_id}.pt")
+        saved += 1
+        if saved % 100 == 0:
+            rate = saved / (time.perf_counter() - t0)
+            print(f"  saved {saved} (skipped {skipped}) [{rate:.1f}/s]")
+    elapsed = time.perf_counter() - t0
+    print(f"Done. Saved {saved}, skipped {skipped} in {elapsed:.0f}s.")
+if __name__ == "__main__":
+    main()

s23dr_2026_example/make_sampled_cache.py CHANGED Viewed

@@ -1,30 +1,30 @@
 #!/usr/bin/env python3
-"""Convert full point cloud cache to pre-sampled 2048-point npz files.
-Reads from either local .pt files or the HF dataset, priority-samples
-2048 points, normalizes, and saves as compact npz files (~50KB each).
-Usage:
-  # From local cache:
-  python make_sampled_cache.py --in-dir /workspace/cache/v2 --out-dir /workspace/cache/sampled
-  # From HF dataset:
-  python make_sampled_cache.py --hf-repo usm3d/s23dr-2026-cached_full_pcd --out-dir /workspace/cache/sampled
-  # Specify split:
-  python make_sampled_cache.py --hf-repo usm3d/s23dr-2026-cached_full_pcd --split validation --out-dir /workspace/cache/sampled_val
-  # With edge classifications (from extract_edge_classes.py):
-  python make_sampled_cache.py --hf-repo usm3d/s23dr-2026-cached_full_pcd --out-dir /workspace/cache/sampled \
-    --edge-classes edge_classifications.npz
-Note: uses a fixed seed so each scene gets one deterministic sample of 2048
-points. This means no sampling augmentation across epochs -- every epoch sees
-the same points. Fine for now; better augmentation can be added later.
 """
 from __future__ import annotations
 import numpy as np
 # Priority sampling (same logic as train.py)
@@ -73,3 +73,87 @@ def _priority_sample(source, group_id, seq_len, colmap_quota, depth_quota):
     return indices[:seq_len], mask

 #!/usr/bin/env python3
+"""Stage 2: priority-sample cached .pt scenes into fixed-size .npz files.
+Reads the per-scene .pt files produced by cache_scenes.py, priority-samples
+a fixed number of points (2048 or 4096), normalizes, and writes one .npz per
+scene (~50KB at 2048, ~100KB at 4096).
+A fixed seed is used so every scene gets one deterministic sample -- no
+per-epoch sampling augmentation, every epoch sees the same points.
+Usage:
+  python -m s23dr_2026_example.make_sampled_cache \\
+      --in-dir cache/full --out-dir cache/sampled_2048 --seq-len 2048
+  python -m s23dr_2026_example.make_sampled_cache \\
+      --in-dir cache/full --out-dir cache/sampled_4096 --seq-len 4096
+The 3:1 colmap:depth quota ratio is fixed: at seq_len=2048 that's
+colmap=1536/depth=512; at seq_len=4096 that's colmap=3072/depth=1024.
 """
 from __future__ import annotations
+import argparse
+import time
+from pathlib import Path
 import numpy as np
+import torch
 # Priority sampling (same logic as train.py)
     return indices[:seq_len], mask
+def _process_sample(d, seq_len, colmap_q, depth_q):
+    """Sample and normalize one cached scene dict into a small npz-ready dict."""
+    xyz = np.asarray(d["xyz"], np.float32)
+    source = np.asarray(d["source"], np.uint8)
+    group_id = np.asarray(d["group_id"], np.int8)
+    class_id = np.asarray(d["class_id"], np.uint8)
+    vis_src = np.asarray(d["visible_src"], np.uint8)
+    vis_id = np.asarray(d["visible_id"], np.int16)
+    center = np.asarray(d["center"], np.float32)
+    scale = float(d["scale"])
+    gt_v = np.asarray(d["gt_vertices"], np.float32)
+    gt_e = np.asarray(d["gt_edges"], np.int32)
+    indices, mask = _priority_sample(source, group_id, seq_len, colmap_q, depth_q)
+    xyz_norm = ((xyz[indices] - center) / scale).astype(np.float32)
+    gt_seg = np.stack([gt_v[gt_e[:, 0]], gt_v[gt_e[:, 1]]], axis=1)
+    gt_seg_norm = ((gt_seg - center) / scale).astype(np.float32)
+    result = {
+        "xyz_norm": xyz_norm,
+        "class_id": class_id[indices].astype(np.uint8),
+        "source": source[indices].astype(np.uint8),
+        "mask": mask,
+        "gt_segments": gt_seg_norm,
+        "scale": np.float32(scale),
+        "center": center,
+        "gt_vertices": gt_v,
+        "gt_edges": gt_e,
+        "visible_src": vis_src[indices].astype(np.uint8),
+        "visible_id": vis_id[indices].astype(np.int16),
+    }
+    if "behind_gest_id" in d:
+        result["behind"] = np.asarray(d["behind_gest_id"], np.int16)[indices]
+    if "n_views_voted" in d:
+        result["n_views_voted"] = np.asarray(d["n_views_voted"], np.uint8)[indices]
+    if "vote_frac" in d:
+        result["vote_frac"] = np.asarray(d["vote_frac"], np.float32)[indices]
+    if "gt_edge_classes" in d:
+        result["gt_edge_classes"] = np.asarray(d["gt_edge_classes"], np.int64)
+    return result
+def main():
+    p = argparse.ArgumentParser(description="Stage 2: cached .pt -> sampled .npz")
+    p.add_argument("--in-dir", required=True, help="Directory of .pt files from cache_scenes.py")
+    p.add_argument("--out-dir", required=True, help="Output directory for .npz files")
+    p.add_argument("--seq-len", type=int, default=2048, help="Points per sample (2048 or 4096)")
+    p.add_argument("--seed", type=int, default=7)
+    args = p.parse_args()
+    colmap_q = args.seq_len * 3 // 4
+    depth_q = args.seq_len - colmap_q
+    print(f"seq_len={args.seq_len}  colmap={colmap_q}  depth={depth_q}")
+    out_dir = Path(args.out_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    np.random.seed(args.seed)
+    files = sorted(Path(args.in_dir).glob("*.pt"))
+    print(f"Found {len(files)} .pt files in {args.in_dir}")
+    done = 0
+    t0 = time.perf_counter()
+    for f in files:
+        out_f = out_dir / (f.stem + ".npz")
+        if out_f.exists():
+            done += 1
+            continue
+        d = torch.load(f, weights_only=False)
+        result = _process_sample(d, args.seq_len, colmap_q, depth_q)
+        np.savez(out_f, **result)
+        done += 1
+        if done % 2000 == 0:
+            rate = done / (time.perf_counter() - t0)
+            print(f"  {done}/{len(files)} [{rate:.0f}/s]")
+    elapsed = time.perf_counter() - t0
+    print(f"Done. {done} files in {elapsed:.0f}s -> {out_dir}")
+if __name__ == "__main__":
+    main()

submitted_2048/README.md ADDED Viewed

	@@ -0,0 +1,35 @@

+# Submitted 2048 Model (public leaderboard entry)
+This is the checkpoint that was actually submitted to the S23DR 2026 public leaderboard. It trains on the 2048-point dataset only (single-stage, no 4096 transfer). The current top-level `checkpoint.pt` (HSS=0.382 val) is its direct descendant via the 3-step 2048 -> 4096 -> endpoint-cooldown recipe.
+| Split | Metric | Score |
+|---|---|---|
+| Public leaderboard (test) | HSS | **0.427** |
+| Internal val (2048, 1024 samples) | HSS_conf | 0.369 |
+| Internal val (4096, 1024 samples) | HSS_conf | 0.367 |
+## Training details
+Single-stage training on `hf://usm3d/s23dr-2026-sampled_2048_v2:train`:
+- **Architecture:** same Perceiver as the current release (hidden=256, latent_tokens=256, latent_layers=7, segments=64)
+- **Input:** 2048 points
+- **Steps:** 160,000
+- **Final LR:** 3e-5 (after cooldown)
+- **Batch size:** 32
+- **Cooldown:** starts at step 140,000, lasts 20,000 steps
+- **Endpoint weight:** 0.1 (used throughout, not only in cooldown)
+- **Confidence weight:** 0.1
+- **Seed:** 353
+Full training args are in `args.json`.
+## How to run inference
+This checkpoint expects 2048-point input. To run it with the submission harness you would need to modify `script.py` to use `SEQ_LEN = 2048`. Alternatively, load the weights manually via `EdgeDepthSegmentsModel` in `s23dr_2026_example/model.py` and feed a 2048-point cloud.
+## Why it is included
+The current release (`../checkpoint.pt`, HSS=0.382 val) is a strict improvement over this one, but only on the internal val split. The **0.427 public leaderboard score** is the only test-set number we have, so this checkpoint is preserved as the empirical anchor for the val-to-test gap.
+Val-to-test gap observed: **0.369 val -> 0.427 test** (about +0.06). The same train/val/test relationship should roughly carry over to the current 0.382-val release, but we do not have a test number for it since the leaderboard uses this older model.

submitted_2048/args.json ADDED Viewed

	@@ -0,0 +1,67 @@

+{
+  "cache_dir": "hf://usm3d/s23dr-2026-sampled_2048_v2:train",
+  "val_cache_dir": "",
+  "arch": "perceiver",
+  "segments": 64,
+  "hidden": 256,
+  "ff": 1024,
+  "latent_tokens": 256,
+  "latent_layers": 7,
+  "encoder_layers": 4,
+  "pre_encoder_layers": 0,
+  "decoder_layers": 3,
+  "decoder_input_xattn": false,
+  "qk_norm": true,
+  "qk_norm_type": "l2",
+  "learnable_fourier": false,
+  "num_heads": 4,
+  "kv_heads_cross": 2,
+  "kv_heads_self": 2,
+  "cross_attn_interval": 4,
+  "dropout": 0.1,
+  "steps": 160000,
+  "batch_size": 32,
+  "lr": 3e-05,
+  "muon_lr": null,
+  "adam_betas": "0.9,0.95",
+  "warmup": 10000,
+  "cosine_decay": false,
+  "cooldown_start": 140000,
+  "cooldown_steps": 20000,
+  "mup": false,
+  "mup_base_width": 128,
+  "seed": 353,
+  "varifold_weight": 0.0,
+  "varifold_cross_only": false,
+  "sinkhorn_weight": 1.0,
+  "sinkhorn_eps": 0.1,
+  "sinkhorn_eps_start": null,
+  "sinkhorn_iters": 20,
+  "sinkhorn_dustbin": 0.3,
+  "vertex_f1_weight": 0.0,
+  "soft_hss_weight": 0.0,
+  "endpoint_weight": 0.1,
+  "endpoint_warmup": 0,
+  "aug_rotate": true,
+  "aug_jitter": 0.0,
+  "aug_drop": 0.0,
+  "aug_flip": true,
+  "gpu_dataset": false,
+  "stored_seq_len": 8192,
+  "rms_norm": true,
+  "activation": "gelu",
+  "behind_emb_dim": 8,
+  "vote_features": true,
+  "segment_param": "midpoint_dir_len",
+  "length_floor": 0.0,
+  "segment_conf": true,
+  "conf_weight": 0.1,
+  "conf_mode": "sinkhorn",
+  "conf_clamp_min": null,
+  "conf_head_wd": 0.1,
+  "optimizer": "adamw",
+  "out_dir": "/workspace/s23dr_2026_example/runs",
+  "resume": "runs/20260322_085443/checkpoints/step125000.pt",
+  "cpu": false,
+  "args_from": "runs/20260322_085443/args.json"
+}

submitted_2048/checkpoint.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc38a61ff512948b1dc92a30129d6efdd093f507948fc5b538050c4a38bfbf6c
+size 106460054