Other
PyTorch
3d-reconstruction
wireframe
building
point-cloud
s23dr
cvpr-2026
jacklangerman commited on
Commit
4946666
·
1 Parent(s): 0f31e57

prep release

Browse files
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: pytorch
4
+ tags:
5
+ - 3d-reconstruction
6
+ - wireframe
7
+ - building
8
+ - point-cloud
9
+ - s23dr
10
+ - cvpr-2026
11
+ datasets:
12
+ - usm3d/s23dr-2026-sampled_4096_v2
13
+ - usm3d/s23dr-2026-sampled_2048_v2
14
+ metrics:
15
+ - HSS
16
+ pipeline_tag: other
17
+ ---
18
+
19
+ # S23DR 2026 Learned Baseline
20
+
21
+ A learned baseline for the **S23DR 2026** challenge (**S**tructured and **S**emantic **3D R**econstruction, or S^2 3DR), part of the [USM3D workshop](https://usm3d.github.io) at CVPR 2026. The model takes a fused point cloud of a building and predicts its wireframe as a set of 3D line segments.
22
+
23
+ **Headline result: HSS = 0.382** on the 1024-sample validation set (shipped checkpoint).
24
+
25
+ For context, the handcrafted baseline scores HSS = 0.307 on the same split.
26
+
27
+ ## Quick start
28
+
29
+ Run the submission pipeline directly (matches the competition eval harness):
30
+
31
+ ```bash
32
+ python script.py
33
+ ```
34
+
35
+ That loads `checkpoint.pt`, fuses the input views into a 4096-point cloud, runs the model, and writes the predicted wireframe for each scene.
36
+
37
+ To reproduce the checkpoint from scratch on a single RTX 4090 (~3 hours):
38
+
39
+ ```bash
40
+ bash reproduce.sh
41
+ ```
42
+
43
+ Or for a bit-identical deterministic run (~5.5 hours, slower because it disables `torch.compile`):
44
+
45
+ ```bash
46
+ bash reproduce_deterministic.sh
47
+ ```
48
+
49
+ Both scripts run the full three-stage recipe described below. See `REPRODUCE.md` for the exact hyperparameters and the reproducibility notes.
50
+
51
+ ## Architecture
52
+
53
+ A Perceiver-style transformer that ingests the point cloud as a sequence of per-point tokens and decodes a fixed set of 3D line segments through cross-attention into a latent.
54
+
55
+ ```
56
+ Perceiver: hidden=256, ff=1024
57
+ latent_tokens=256, latent_layers=7
58
+ encoder_layers=4, decoder_layers=3, cross_attn_interval=4
59
+ num_heads=4, kv_heads_cross=2, kv_heads_self=2
60
+ qk_norm=L2, rms_norm=True, dropout=0.1
61
+ segments=64, segment_param=midpoint_dir_len, segment_conf=True
62
+ behind_emb_dim=8, vote_features=True, activation=gelu
63
+ ```
64
+
65
+ The decoder predicts 64 candidate segments, each parametrized as midpoint + direction + length with a confidence head. Training uses a Sinkhorn optimal-transport loss to match predicted segments to ground-truth, plus a symmetric endpoint L1 term in the cooldown stage.
66
+
67
+ All architecture and optimizer settings live in `configs/base.json`.
68
+
69
+ ## Training recipe
70
+
71
+ The model ships with a three-stage recipe. Each stage starts from the previous stage's final checkpoint.
72
+
73
+ | Stage | Input | Steps | LR | Batch | Notes | HSS |
74
+ |---|---|---|---|---|---|---|
75
+ | 1. 2048 from scratch | 2048 pts | 0 -> 125k | 3e-4, warmup 10k | 32 | Random init, sinkhorn only | 0.281 |
76
+ | 2. 4096 finetune | 4096 pts | 125k -> 135k | 3e-5 constant | 64 | Gentle LR preserves representations | 0.351 |
77
+ | 3. Endpoint cooldown | 4096 pts | 135k -> 170k | 3e-5 then linear decay | 64 | Adds endpoint L1 loss, tightens vertices | **0.382** |
78
+
79
+ **Why 2048 first:** training directly on 4096 overfits (1.47x train/val ratio vs 1.19x for 2048). Starting on 2048 produces better-generalized representations that the 4096 finetune can then specialize.
80
+
81
+ **Why a gentle LR on finetune:** LR > 1e-4 causes catastrophic forgetting of the 2048 geometry understanding.
82
+
83
+ **Why endpoint loss only in stage 3:** the Sinkhorn loss operates on the midpoint/direction/length parametrization and doesn't directly penalize vertex position error. Adding a symmetric endpoint L1 against the detached Sinkhorn assignment tightens vertex precision in the cooldown.
84
+
85
+ Full details, including the "what does not work" list (BuildingWorld pretraining, mixed training, high dropout, etc.), are in `REPRODUCE.md`.
86
+
87
+ ## Evaluation
88
+
89
+ > **About the numbers:** all val scores below are HSS at confidence threshold 0.7, averaged over the 1024-sample *internal validation split* we hold out from the published training data (`usm3d/s23dr-2026-sampled_{2048,4096}_v2:validation`). They are **not** test-set numbers. The only test-set number we have is the public leaderboard score of the older 2048 submission (see `submitted_2048/` and the last table below).
90
+ >
91
+ > All numbers below are freshly measured in this release against the checkpoints in this repo.
92
+
93
+ ### Shipped model and reproductions
94
+
95
+ | Model | Checkpoint | HSS @ 4096 | HSS @ 2048 |
96
+ |---|---|---|---|
97
+ | Handcrafted baseline | — | 0.307 | — |
98
+ | **Current release (shipped)** | `checkpoint.pt` | **0.3819** | 0.3734 |
99
+ | Closest compiled E2E repro (#4) | `repro_runs/e2e_repro4_hss379/` | 0.3736 | 0.3675 |
100
+ | Best compiled repro from this codebase | `repro_runs/compiled_repro_hss376/` | 0.3757 | 0.3670 |
101
+ | Deterministic E2E repro (bit-reproducible) | `repro_runs/deterministic_hss372/` | 0.3716 | 0.3665 |
102
+
103
+ All repros use the exact 3-stage recipe on a single RTX 4090. The shipped `checkpoint.pt` was trained on the same recipe before this release branch was cut; the ~0.005-0.010 HSS gap between shipped and repros is compiled-mode run-to-run variance (see the Reproducibility section).
104
+
105
+ ### Training progression (deterministic repro, all stages measured fresh)
106
+
107
+ | Stage | Steps | HSS @ 4096 | HSS @ 2048 |
108
+ |---|---|---|---|
109
+ | 1. 2048 from-scratch | 125k | 0.2755 | 0.2812 |
110
+ | 2. 4096 finetune | 135k | 0.3557 | 0.3510 |
111
+ | 3. Endpoint cooldown | 170k | 0.3716 | 0.3665 |
112
+
113
+ The stage 1 -> stage 2 jump (+0.08 HSS on 4096) is the biggest single improvement and motivates the 2048 -> 4096 transfer. Stage 3 (endpoint cooldown) adds another +0.016. Note how stage 1 is slightly better at 2048 than at 4096 (because it was only trained on 2048), while stages 2 and 3 invert that ordering after being finetuned on 4096.
114
+
115
+ ### Previously submitted model (2048, single-stage)
116
+
117
+ The `submitted_2048/` directory holds the checkpoint we actually sent to the public leaderboard. It was trained in a single stage on 2048-point data and is a direct ancestor of the current release.
118
+
119
+ | Split | Metric | Score |
120
+ |---|---|---|
121
+ | **Public leaderboard (test)** | **HSS** | **0.427** |
122
+ | Internal val @ 2048 | HSS | 0.3692 |
123
+ | Internal val @ 4096 | HSS | 0.3665 |
124
+
125
+ We do not have a test number for the current release, but the val-to-test gap observed on this 2048 submission was about **+0.06 HSS** (0.37 val -> 0.43 test). A similar gap on the current `checkpoint.pt` (0.382 val) would suggest a test score in the low 0.44s, though this is extrapolation and unverified.
126
+
127
+ ## Reproducibility
128
+
129
+ | Test | Result |
130
+ |---|---|
131
+ | Forward pass (same ckpt, same input) | bit-identical (0.00 diff) |
132
+ | Deterministic mode, 3 independent runs | bit-identical (162 tensors, max_diff=0.0) |
133
+ | Step 3 from same stage-2 ckpt (2 runs) | HSS=0.382, 0.384 |
134
+ | Compiled-mode E2E variance across runs | ~0.03 HSS (Triton kernel nondeterminism) |
135
+
136
+ `reproduce_deterministic.sh` produces byte-identical weights across runs with the same seed, at the cost of ~2x slower training (no `torch.compile`). Compiled mode has small run-to-run variance from Triton kernel selection that grows through chaotic SGD dynamics; E2E compiled repros land in the 0.349-0.379 range.
137
+
138
+ A subtle iteration-order effect: the shipped `bad_samples.txt` has 156 non-empty entries (the file lacks a trailing newline so `wc -l` reports 155). Two additional bad samples were discovered after training - they are legitimately bad GT but adding them changes the batch iteration order and costs ~0.005 HSS in deterministic mode and ~0.04 in compiled mode. See the "Reproducibility Notes" section of `REPRODUCE.md` for the full story.
139
+
140
+ ## Repository layout
141
+
142
+ ```
143
+ checkpoint.pt shipped HSS=0.382 model (step 170000), 4096-point input
144
+ script.py competition inference entry point (uses checkpoint.pt)
145
+ s23dr_2026_example/ training package (model, data, train loop, losses)
146
+ configs/base.json shared training config
147
+ reproduce.sh compiled-mode E2E reproduction (~3 hr)
148
+ reproduce_deterministic.sh bit-reproducible E2E reproduction (~5.5 hr)
149
+ REPRODUCE.md detailed recipe, results, ablations, notes
150
+
151
+ submitted_2048/ the model we actually sent to the public leaderboard (HSS_test=0.427)
152
+ checkpoint.pt single-stage 2048 model (step 160000)
153
+ args.json full training args
154
+ README.md training details and val/test scores
155
+
156
+ repro_runs/ evidence that the 3-stage recipe reproduces
157
+ e2e_repro4_hss379/ closest compiled E2E repro (val HSS=0.374)
158
+ compiled_repro_hss376/ best compiled repro from this codebase (val HSS=0.376)
159
+ deterministic_hss372/ bit-reproducible deterministic repro (val HSS=0.372)
160
+ ```
161
+
162
+ Each directory under `repro_runs/` contains the three stage-final checkpoints (125k / 135k / 170k) plus their `args.json`, so a participant can resume from any stage. Note the directory names carry the score at the time the directory was created, which may differ by ~0.002 from fresh evals in the table above due to random variation in post-processing and CUDA kernel selection.
163
+
164
+ ## Related branches
165
+
166
+ - `main` - this release
167
+ - `best-4096-transfer` - working branch with full commit history and internal dev notes
168
+ - `validation-archive` - cold archive of all validation runs (logs, final checkpoints, args) used to verify the release
169
+
170
+ ## License
171
+
172
+ **CC-BY-NC 4.0.** The model weights and code in this repository are released under the Creative Commons Attribution-NonCommercial 4.0 International license. You are free to use, share, and adapt this work for **non-commercial** purposes, provided you give appropriate **attribution**. The training and validation datasets (`usm3d/s23dr-2026-sampled_*`) have their own terms - see the S23DR 2026 competition page for details.
173
+
174
+ ## Acknowledgements
175
+
176
+ This checkpoint is released as a public learned baseline for participants of the **S23DR 2026** challenge, part of the [USM3D workshop](https://usm3d.github.io) at CVPR 2026.
make_datasets.sh ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Rebuild the sampled datasets from scratch, starting from the public raw
3
+ # `usm3d/hoho22k_2026_trainval` dataset. Two stages:
4
+ #
5
+ # 1. cache_scenes.py : stream raw shards -> per-scene .pt files
6
+ # (runs point fusion + priority grouping)
7
+ # 2. make_sampled_cache.py : per-scene .pt -> fixed-size .npz files
8
+ # (priority samples to seq_len=2048 or 4096)
9
+ #
10
+ # This reproduces the content of
11
+ # hf://usm3d/s23dr-2026-sampled_2048_v2
12
+ # hf://usm3d/s23dr-2026-sampled_4096_v2
13
+ # without needing the intermediate (private) cached_full_pcd dataset.
14
+ #
15
+ # ~3-4 hr on a workstation for the full train+val set (network-bound in stage 1).
16
+ set -e
17
+
18
+ OUT_ROOT="${1:-cache}"
19
+ FULL_TRAIN="$OUT_ROOT/full/train"
20
+ FULL_VAL="$OUT_ROOT/full/validation"
21
+
22
+ # ----- Stage 1: raw -> per-scene .pt -----
23
+ echo "=== Stage 1: caching train scenes from raw tars ==="
24
+ python -m s23dr_2026_example.cache_scenes --out-dir "$FULL_TRAIN" --split train --skip-existing
25
+
26
+ echo "=== Stage 1: caching validation scenes from raw tars ==="
27
+ python -m s23dr_2026_example.cache_scenes --out-dir "$FULL_VAL" --split validation --skip-existing
28
+
29
+ # ----- Stage 2: .pt -> sampled .npz -----
30
+ for split in train validation; do
31
+ for seq_len in 2048 4096; do
32
+ in_dir="$OUT_ROOT/full/$split"
33
+ out_dir="$OUT_ROOT/sampled_${seq_len}/$split"
34
+ echo "=== Stage 2: sampling $split at seq_len=$seq_len ==="
35
+ python -m s23dr_2026_example.make_sampled_cache \
36
+ --in-dir "$in_dir" --out-dir "$out_dir" --seq-len "$seq_len"
37
+ done
38
+ done
39
+
40
+ echo ""
41
+ echo "All done. Sampled datasets are at:"
42
+ echo " $OUT_ROOT/sampled_2048/{train,validation}"
43
+ echo " $OUT_ROOT/sampled_4096/{train,validation}"
44
+ echo ""
45
+ echo "To train from these, point reproduce.sh at them via"
46
+ echo " --cache-dir \"\$OUT_ROOT/sampled_2048/train\" (and similar for val/4096)"
47
+ echo "instead of the default hf:// URLs."
s23dr_2026_example/cache_scenes.py CHANGED
@@ -1,31 +1,42 @@
1
  #!/usr/bin/env python3
2
  """Cache compact scenes from HoHo22k shards to training-ready .pt files.
3
 
4
- Runs build_compact_scene + precomputes group_id, semantic class, and
5
- normalization so training only needs fast sampling + GPU forward.
 
 
 
6
 
7
  Usage:
8
- python cache_scenes.py --data-dir data/ --out-dir cache/train
9
- python cache_scenes.py --streaming --out-dir cache/train --limit 5000
10
- python cache_scenes.py --data-dir data/ --out-dir cache/train --workers 4
11
-
12
- Cache format per file (.pt):
13
- xyz: float32 [P, 3] all points in world space
14
- source: uint8 [P] 0=colmap, 1=depth
15
- group_id: int8 [P] priority tier 0-4, -1=excluded
16
- class_id: uint8 [P] one-hot class index (0-12), see SEMANTIC_CLASSES
17
- visible_src: uint8 [P] for visualization (1=gestalt, 2=ade)
18
- visible_id: int16 [P] for visualization (class id within space)
19
- center: float32 [3] smart normalization center
20
- scale: float32 scalar smart normalization scale
21
- gt_vertices: float32 [V, 3] ground truth wireframe vertices
22
- gt_edges: int32 [E, 2] ground truth wireframe edge indices
 
 
23
  """
24
  from __future__ import annotations
25
 
 
 
 
 
26
  import numpy as np
 
27
 
28
  from .point_fusion import (
 
29
  GEST_ID_TO_NAME, ADE_ID_TO_NAME, NUM_GEST,
30
  )
31
 
@@ -176,3 +187,96 @@ def _compute_smart_center_scale(xyz, source, mad_k=2.5, percentile=95.0,
176
  return center.astype(np.float32), np.float32(scale)
177
 
178
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  #!/usr/bin/env python3
2
  """Cache compact scenes from HoHo22k shards to training-ready .pt files.
3
 
4
+ Streams samples from the public `usm3d/hoho22k_2026_trainval` dataset, runs
5
+ `build_compact_scene` (see point_fusion.py), precomputes priority group_id
6
+ and semantic class_id, and saves one .pt per scene.
7
+
8
+ Stage 1 of the dataset pipeline. See make_sampled_cache.py for stage 2.
9
 
10
  Usage:
11
+ python -m s23dr_2026_example.cache_scenes --out-dir cache/full --split train
12
+ python -m s23dr_2026_example.cache_scenes --out-dir cache/full_val --split validation
13
+
14
+ Cache format per .pt file:
15
+ xyz: float32 [P, 3] all points in world space
16
+ source: uint8 [P] 0=colmap, 1=depth
17
+ group_id: int8 [P] priority tier 0-4, -1=excluded
18
+ class_id: uint8 [P] one-hot class index (0-12)
19
+ behind_gest_id: int16 [P] behind-gestalt id (-1 if none)
20
+ visible_src: uint8 [P] 1=gestalt, 2=ade
21
+ visible_id: int16 [P] class id within space
22
+ n_views_voted: uint8 [P] number of views that voted
23
+ vote_frac: float32 [P] fraction of votes
24
+ center: float32 [3] smart normalization center
25
+ scale: float32 scalar smart normalization scale
26
+ gt_vertices: float32 [V, 3] ground truth wireframe vertices
27
+ gt_edges: int32 [E, 2] ground truth wireframe edge indices
28
  """
29
  from __future__ import annotations
30
 
31
+ import argparse
32
+ import time
33
+ from pathlib import Path
34
+
35
  import numpy as np
36
+ import torch
37
 
38
  from .point_fusion import (
39
+ FuserConfig, build_compact_scene,
40
  GEST_ID_TO_NAME, ADE_ID_TO_NAME, NUM_GEST,
41
  )
42
 
 
187
  return center.astype(np.float32), np.float32(scale)
188
 
189
 
190
+ # ---------------------------------------------------------------------------
191
+ # Dataset pipeline stage 1: raw HF sample -> cached .pt
192
+ # ---------------------------------------------------------------------------
193
+
194
+ def _process_one(sample, cfg):
195
+ """Fuse a single HF sample into a cache dict. Returns (order_id, dict) or None."""
196
+ rng = np.random.RandomState()
197
+
198
+ n_edges = len(sample.get("wf_edges", []))
199
+ if n_edges == 0 or n_edges > 64:
200
+ return None
201
+
202
+ scene = build_compact_scene(sample, cfg, rng=rng)
203
+ if scene is None:
204
+ return None
205
+
206
+ gt_v = scene.get("gt_vertices")
207
+ gt_e = scene.get("gt_edges")
208
+ if gt_v is None or gt_e is None or len(gt_e) == 0:
209
+ return None
210
+
211
+ xyz = scene["xyz"]
212
+ source = scene["source"]
213
+ group_id, class_id = _compute_group_and_class(
214
+ scene["visible_src"], scene["visible_id"], scene["behind_gest_id"], source)
215
+ center, scale = _compute_smart_center_scale(xyz, source)
216
+
217
+ gt_edge_classes = np.asarray(sample["wf_classifications"], dtype=np.int64)
218
+ return sample["order_id"], {
219
+ "xyz": xyz.astype(np.float32),
220
+ "source": source.astype(np.uint8),
221
+ "group_id": group_id,
222
+ "class_id": class_id,
223
+ "behind_gest_id": scene["behind_gest_id"].astype(np.int16),
224
+ "visible_src": scene["visible_src"].astype(np.uint8),
225
+ "visible_id": scene["visible_id"].astype(np.int16),
226
+ "n_views_voted": scene["n_views_voted"],
227
+ "vote_frac": scene["vote_frac"],
228
+ "center": center,
229
+ "scale": scale,
230
+ "gt_vertices": gt_v.astype(np.float32),
231
+ "gt_edges": gt_e.astype(np.int32),
232
+ "gt_edge_classes": gt_edge_classes,
233
+ }
234
+
235
+
236
+ def main():
237
+ p = argparse.ArgumentParser(description="Stage 1: HoHo22k -> cached .pt files")
238
+ p.add_argument("--out-dir", required=True, help="Output directory for .pt files")
239
+ p.add_argument("--split", default="train", choices=["train", "validation"])
240
+ p.add_argument("--limit", type=int, default=0, help="Stop after N samples (0 = all)")
241
+ p.add_argument("--depth-per-view", type=int, default=8000)
242
+ p.add_argument("--skip-existing", action="store_true")
243
+ args = p.parse_args()
244
+
245
+ out_dir = Path(args.out_dir)
246
+ out_dir.mkdir(parents=True, exist_ok=True)
247
+ existing = {p.stem for p in out_dir.glob("*.pt")} if args.skip_existing else set()
248
+
249
+ from datasets import load_dataset
250
+ print(f"Streaming usm3d/hoho22k_2026_trainval split={args.split}...")
251
+ ds = load_dataset("usm3d/hoho22k_2026_trainval",
252
+ streaming=True, trust_remote_code=True, split=args.split)
253
+
254
+ cfg = FuserConfig(depth_points_per_view=args.depth_per_view)
255
+ saved, skipped = 0, 0
256
+ t0 = time.perf_counter()
257
+ for i, sample in enumerate(ds):
258
+ if args.limit > 0 and i >= args.limit:
259
+ break
260
+ oid = sample["order_id"]
261
+ if oid in existing:
262
+ skipped += 1
263
+ continue
264
+ result = _process_one(sample, cfg)
265
+ if result is None:
266
+ skipped += 1
267
+ continue
268
+ order_id, data = result
269
+ torch.save(data, out_dir / f"{order_id}.pt")
270
+ saved += 1
271
+ if saved % 100 == 0:
272
+ rate = saved / (time.perf_counter() - t0)
273
+ print(f" saved {saved} (skipped {skipped}) [{rate:.1f}/s]")
274
+
275
+ elapsed = time.perf_counter() - t0
276
+ print(f"Done. Saved {saved}, skipped {skipped} in {elapsed:.0f}s.")
277
+
278
+
279
+ if __name__ == "__main__":
280
+ main()
281
+
282
+
s23dr_2026_example/make_sampled_cache.py CHANGED
@@ -1,30 +1,30 @@
1
  #!/usr/bin/env python3
2
- """Convert full point cloud cache to pre-sampled 2048-point npz files.
3
 
4
- Reads from either local .pt files or the HF dataset, priority-samples
5
- 2048 points, normalizes, and saves as compact npz files (~50KB each).
 
6
 
7
- Usage:
8
- # From local cache:
9
- python make_sampled_cache.py --in-dir /workspace/cache/v2 --out-dir /workspace/cache/sampled
10
-
11
- # From HF dataset:
12
- python make_sampled_cache.py --hf-repo usm3d/s23dr-2026-cached_full_pcd --out-dir /workspace/cache/sampled
13
-
14
- # Specify split:
15
- python make_sampled_cache.py --hf-repo usm3d/s23dr-2026-cached_full_pcd --split validation --out-dir /workspace/cache/sampled_val
16
 
17
- # With edge classifications (from extract_edge_classes.py):
18
- python make_sampled_cache.py --hf-repo usm3d/s23dr-2026-cached_full_pcd --out-dir /workspace/cache/sampled \
19
- --edge-classes edge_classifications.npz
 
 
20
 
21
- Note: uses a fixed seed so each scene gets one deterministic sample of 2048
22
- points. This means no sampling augmentation across epochs -- every epoch sees
23
- the same points. Fine for now; better augmentation can be added later.
24
  """
25
  from __future__ import annotations
26
 
 
 
 
 
27
  import numpy as np
 
28
 
29
 
30
  # Priority sampling (same logic as train.py)
@@ -73,3 +73,87 @@ def _priority_sample(source, group_id, seq_len, colmap_quota, depth_quota):
73
  return indices[:seq_len], mask
74
 
75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  #!/usr/bin/env python3
2
+ """Stage 2: priority-sample cached .pt scenes into fixed-size .npz files.
3
 
4
+ Reads the per-scene .pt files produced by cache_scenes.py, priority-samples
5
+ a fixed number of points (2048 or 4096), normalizes, and writes one .npz per
6
+ scene (~50KB at 2048, ~100KB at 4096).
7
 
8
+ A fixed seed is used so every scene gets one deterministic sample -- no
9
+ per-epoch sampling augmentation, every epoch sees the same points.
 
 
 
 
 
 
 
10
 
11
+ Usage:
12
+ python -m s23dr_2026_example.make_sampled_cache \\
13
+ --in-dir cache/full --out-dir cache/sampled_2048 --seq-len 2048
14
+ python -m s23dr_2026_example.make_sampled_cache \\
15
+ --in-dir cache/full --out-dir cache/sampled_4096 --seq-len 4096
16
 
17
+ The 3:1 colmap:depth quota ratio is fixed: at seq_len=2048 that's
18
+ colmap=1536/depth=512; at seq_len=4096 that's colmap=3072/depth=1024.
 
19
  """
20
  from __future__ import annotations
21
 
22
+ import argparse
23
+ import time
24
+ from pathlib import Path
25
+
26
  import numpy as np
27
+ import torch
28
 
29
 
30
  # Priority sampling (same logic as train.py)
 
73
  return indices[:seq_len], mask
74
 
75
 
76
+ def _process_sample(d, seq_len, colmap_q, depth_q):
77
+ """Sample and normalize one cached scene dict into a small npz-ready dict."""
78
+ xyz = np.asarray(d["xyz"], np.float32)
79
+ source = np.asarray(d["source"], np.uint8)
80
+ group_id = np.asarray(d["group_id"], np.int8)
81
+ class_id = np.asarray(d["class_id"], np.uint8)
82
+ vis_src = np.asarray(d["visible_src"], np.uint8)
83
+ vis_id = np.asarray(d["visible_id"], np.int16)
84
+ center = np.asarray(d["center"], np.float32)
85
+ scale = float(d["scale"])
86
+ gt_v = np.asarray(d["gt_vertices"], np.float32)
87
+ gt_e = np.asarray(d["gt_edges"], np.int32)
88
+
89
+ indices, mask = _priority_sample(source, group_id, seq_len, colmap_q, depth_q)
90
+ xyz_norm = ((xyz[indices] - center) / scale).astype(np.float32)
91
+ gt_seg = np.stack([gt_v[gt_e[:, 0]], gt_v[gt_e[:, 1]]], axis=1)
92
+ gt_seg_norm = ((gt_seg - center) / scale).astype(np.float32)
93
+
94
+ result = {
95
+ "xyz_norm": xyz_norm,
96
+ "class_id": class_id[indices].astype(np.uint8),
97
+ "source": source[indices].astype(np.uint8),
98
+ "mask": mask,
99
+ "gt_segments": gt_seg_norm,
100
+ "scale": np.float32(scale),
101
+ "center": center,
102
+ "gt_vertices": gt_v,
103
+ "gt_edges": gt_e,
104
+ "visible_src": vis_src[indices].astype(np.uint8),
105
+ "visible_id": vis_id[indices].astype(np.int16),
106
+ }
107
+ if "behind_gest_id" in d:
108
+ result["behind"] = np.asarray(d["behind_gest_id"], np.int16)[indices]
109
+ if "n_views_voted" in d:
110
+ result["n_views_voted"] = np.asarray(d["n_views_voted"], np.uint8)[indices]
111
+ if "vote_frac" in d:
112
+ result["vote_frac"] = np.asarray(d["vote_frac"], np.float32)[indices]
113
+ if "gt_edge_classes" in d:
114
+ result["gt_edge_classes"] = np.asarray(d["gt_edge_classes"], np.int64)
115
+ return result
116
+
117
+
118
+ def main():
119
+ p = argparse.ArgumentParser(description="Stage 2: cached .pt -> sampled .npz")
120
+ p.add_argument("--in-dir", required=True, help="Directory of .pt files from cache_scenes.py")
121
+ p.add_argument("--out-dir", required=True, help="Output directory for .npz files")
122
+ p.add_argument("--seq-len", type=int, default=2048, help="Points per sample (2048 or 4096)")
123
+ p.add_argument("--seed", type=int, default=7)
124
+ args = p.parse_args()
125
+
126
+ colmap_q = args.seq_len * 3 // 4
127
+ depth_q = args.seq_len - colmap_q
128
+ print(f"seq_len={args.seq_len} colmap={colmap_q} depth={depth_q}")
129
+
130
+ out_dir = Path(args.out_dir)
131
+ out_dir.mkdir(parents=True, exist_ok=True)
132
+ np.random.seed(args.seed)
133
+
134
+ files = sorted(Path(args.in_dir).glob("*.pt"))
135
+ print(f"Found {len(files)} .pt files in {args.in_dir}")
136
+
137
+ done = 0
138
+ t0 = time.perf_counter()
139
+ for f in files:
140
+ out_f = out_dir / (f.stem + ".npz")
141
+ if out_f.exists():
142
+ done += 1
143
+ continue
144
+ d = torch.load(f, weights_only=False)
145
+ result = _process_sample(d, args.seq_len, colmap_q, depth_q)
146
+ np.savez(out_f, **result)
147
+ done += 1
148
+ if done % 2000 == 0:
149
+ rate = done / (time.perf_counter() - t0)
150
+ print(f" {done}/{len(files)} [{rate:.0f}/s]")
151
+
152
+ elapsed = time.perf_counter() - t0
153
+ print(f"Done. {done} files in {elapsed:.0f}s -> {out_dir}")
154
+
155
+
156
+ if __name__ == "__main__":
157
+ main()
158
+
159
+
submitted_2048/README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Submitted 2048 Model (public leaderboard entry)
2
+
3
+ This is the checkpoint that was actually submitted to the S23DR 2026 public leaderboard. It trains on the 2048-point dataset only (single-stage, no 4096 transfer). The current top-level `checkpoint.pt` (HSS=0.382 val) is its direct descendant via the 3-step 2048 -> 4096 -> endpoint-cooldown recipe.
4
+
5
+ | Split | Metric | Score |
6
+ |---|---|---|
7
+ | Public leaderboard (test) | HSS | **0.427** |
8
+ | Internal val (2048, 1024 samples) | HSS_conf | 0.369 |
9
+ | Internal val (4096, 1024 samples) | HSS_conf | 0.367 |
10
+
11
+ ## Training details
12
+
13
+ Single-stage training on `hf://usm3d/s23dr-2026-sampled_2048_v2:train`:
14
+
15
+ - **Architecture:** same Perceiver as the current release (hidden=256, latent_tokens=256, latent_layers=7, segments=64)
16
+ - **Input:** 2048 points
17
+ - **Steps:** 160,000
18
+ - **Final LR:** 3e-5 (after cooldown)
19
+ - **Batch size:** 32
20
+ - **Cooldown:** starts at step 140,000, lasts 20,000 steps
21
+ - **Endpoint weight:** 0.1 (used throughout, not only in cooldown)
22
+ - **Confidence weight:** 0.1
23
+ - **Seed:** 353
24
+
25
+ Full training args are in `args.json`.
26
+
27
+ ## How to run inference
28
+
29
+ This checkpoint expects 2048-point input. To run it with the submission harness you would need to modify `script.py` to use `SEQ_LEN = 2048`. Alternatively, load the weights manually via `EdgeDepthSegmentsModel` in `s23dr_2026_example/model.py` and feed a 2048-point cloud.
30
+
31
+ ## Why it is included
32
+
33
+ The current release (`../checkpoint.pt`, HSS=0.382 val) is a strict improvement over this one, but only on the internal val split. The **0.427 public leaderboard score** is the only test-set number we have, so this checkpoint is preserved as the empirical anchor for the val-to-test gap.
34
+
35
+ Val-to-test gap observed: **0.369 val -> 0.427 test** (about +0.06). The same train/val/test relationship should roughly carry over to the current 0.382-val release, but we do not have a test number for it since the leaderboard uses this older model.
submitted_2048/args.json ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cache_dir": "hf://usm3d/s23dr-2026-sampled_2048_v2:train",
3
+ "val_cache_dir": "",
4
+ "arch": "perceiver",
5
+ "segments": 64,
6
+ "hidden": 256,
7
+ "ff": 1024,
8
+ "latent_tokens": 256,
9
+ "latent_layers": 7,
10
+ "encoder_layers": 4,
11
+ "pre_encoder_layers": 0,
12
+ "decoder_layers": 3,
13
+ "decoder_input_xattn": false,
14
+ "qk_norm": true,
15
+ "qk_norm_type": "l2",
16
+ "learnable_fourier": false,
17
+ "num_heads": 4,
18
+ "kv_heads_cross": 2,
19
+ "kv_heads_self": 2,
20
+ "cross_attn_interval": 4,
21
+ "dropout": 0.1,
22
+ "steps": 160000,
23
+ "batch_size": 32,
24
+ "lr": 3e-05,
25
+ "muon_lr": null,
26
+ "adam_betas": "0.9,0.95",
27
+ "warmup": 10000,
28
+ "cosine_decay": false,
29
+ "cooldown_start": 140000,
30
+ "cooldown_steps": 20000,
31
+ "mup": false,
32
+ "mup_base_width": 128,
33
+ "seed": 353,
34
+ "varifold_weight": 0.0,
35
+ "varifold_cross_only": false,
36
+ "sinkhorn_weight": 1.0,
37
+ "sinkhorn_eps": 0.1,
38
+ "sinkhorn_eps_start": null,
39
+ "sinkhorn_iters": 20,
40
+ "sinkhorn_dustbin": 0.3,
41
+ "vertex_f1_weight": 0.0,
42
+ "soft_hss_weight": 0.0,
43
+ "endpoint_weight": 0.1,
44
+ "endpoint_warmup": 0,
45
+ "aug_rotate": true,
46
+ "aug_jitter": 0.0,
47
+ "aug_drop": 0.0,
48
+ "aug_flip": true,
49
+ "gpu_dataset": false,
50
+ "stored_seq_len": 8192,
51
+ "rms_norm": true,
52
+ "activation": "gelu",
53
+ "behind_emb_dim": 8,
54
+ "vote_features": true,
55
+ "segment_param": "midpoint_dir_len",
56
+ "length_floor": 0.0,
57
+ "segment_conf": true,
58
+ "conf_weight": 0.1,
59
+ "conf_mode": "sinkhorn",
60
+ "conf_clamp_min": null,
61
+ "conf_head_wd": 0.1,
62
+ "optimizer": "adamw",
63
+ "out_dir": "/workspace/s23dr_2026_example/runs",
64
+ "resume": "runs/20260322_085443/checkpoints/step125000.pt",
65
+ "cpu": false,
66
+ "args_from": "runs/20260322_085443/args.json"
67
+ }
submitted_2048/checkpoint.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc38a61ff512948b1dc92a30129d6efdd093f507948fc5b538050c4a38bfbf6c
3
+ size 106460054