Image Feature Extraction
PyTorch
deltatok
cvpr2026-highlight
tommiekerssies commited on
Commit
74d9e65
·
1 Parent(s): a8646ee

Standardize model card

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -10,19 +10,20 @@ tags:
10
 
11
  # DeltaTok (Tokenizer) — Kinetics-700
12
 
13
- This repository contains the DeltaTok weights as presented in the paper [A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens](https://huggingface.co/papers/2604.04913) (CVPR 2026).
14
 
15
- [**Project Page**](https://deltatok.github.io) | [**GitHub**](https://github.com/amazon-far/deltatok)
16
 
17
- DeltaTok is a video tokenizer that encodes the vision foundation model (VFM) feature differences between consecutive frames into a single continuous "delta" token. This approach significantly reduces the token count in video sequences (e.g., 1,024x reduction) while enabling efficient generative world modeling.
18
-
19
- ## Model Description
20
-
21
- This repository contains the ViT-B encoder and decoder trained on Kinetics-700 at 512x512 resolution. The model is designed to work with a frozen [DINOv3](https://github.com/facebookresearch/dinov3) ViT-B backbone (not included).
22
 
23
  ## Usage
24
 
25
- Please refer to the [DeltaTok GitHub repository](https://github.com/amazon-far/deltatok) for setup, training, and evaluation instructions.
 
 
 
 
 
26
 
27
  ## Acknowledgements
28
 
 
10
 
11
  # DeltaTok (Tokenizer) — Kinetics-700
12
 
13
+ DeltaTok is a video tokenizer that encodes the vision foundation model (VFM) feature differences between consecutive frames into a single continuous "delta" token, as introduced in [A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens](https://huggingface.co/papers/2604.04913) (CVPR 2026). This approach significantly reduces the token count in video sequences (e.g., 1,024x reduction) and enables efficient generative world modeling.
14
 
15
+ [**Project Page**](https://deltatok.github.io) | [**GitHub**](https://github.com/amazon-far/deltatok) | [**Paper**](https://huggingface.co/papers/2604.04913)
16
 
17
+ This repository contains the ViT-B encoder and decoder trained on Kinetics-700 at 512x512 resolution.
 
 
 
 
18
 
19
  ## Usage
20
 
21
+ Requires a frozen [DINOv3](https://github.com/facebookresearch/dinov3) ViT-B backbone. Full training and evaluation code is available in the [DeltaTok GitHub repository](https://github.com/amazon-far/deltatok). To evaluate:
22
+
23
+ ```bash
24
+ python main.py validate -c configs/deltatok_vitb_dinov3_vitb_kinetics.yaml \
25
+ --model.ckpt_path=path/to/deltatok-kinetics/pytorch_model.bin
26
+ ```
27
 
28
  ## Acknowledgements
29