Improve model card: add arXiv link, fix image paths and add metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +24 -51
README.md CHANGED
@@ -1,22 +1,26 @@
1
  ---
2
- pipeline_tag: image-to-3d
3
- license: mit
4
  language:
5
  - en
 
 
 
 
 
 
6
  ---
7
 
8
  # SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass (3DV 2026)
9
 
10
- This repository contains the official PyTorch implementation of SceneGen: https://arxiv.org/abs/2508.15769/.
11
 
12
  **Now the Training, Inference Code, and Pretrained Models have all been released! Feel free to reach out for discussions!**
13
 
14
  <div align="center">
15
- <img src="./assets/SceneGen.png">
16
  </div>
17
 
18
- ## 🌟 Some Information
19
- [Project Page](https://mengmouxu.github.io/SceneGen/) · [Paper](https://arxiv.org/abs/2508.15769/) · [Checkpoints](https://huggingface.co/haoningwu/SceneGen/)
20
 
21
  ## ⏩ News
22
  - [2025.11] Evaluation code has been released.
@@ -92,82 +96,51 @@ This script launches a Gradio web interface for interactive scene generation.
92
  > 1. Adjust generation parameters (optional).
93
  > 2. Click **"Generate 3D Scene"**.
94
  > 3. Download the generated GLB file when ready.
95
- >
96
- > **💡 Pro Tip:** Try the examples below to get started quickly!
97
-
98
- https://github.com/user-attachments/assets/d0d53506-70cd-4bd3-a6ab-2f9b5b16f4d8
99
 
100
-
101
- *Click the image above to watch the demo video*
102
 
103
  ### Pre-segmented Image Inference
104
  This script processes a directory of pre-segmented images.
105
  - **Input**: The input folder structure should be similar to `assets/masked_image_test`, containing segmented scene images.
106
- - **Visualization**: For scenes with ground truth data, you can use the `--gradio` flag to launch a Gradio interface that visualizes both the ground truth and the generated model. We provide data from the 3D-FUTURE test set as a demonstration.
107
  - **Usage**:
108
  ```sh
109
  python inference.py --gradio
110
  ```
111
 
112
  ## 📚 Dataset
113
- To train and evaluate SceneGen, we use the [3D-FUTURE](https://tianchi.aliyun.com/dataset/98063) dataset. Please download and preprocess the dataset as follows:
114
- 1. Download the 3D-FUTURE dataset from [here](https://tianchi.aliyun.com/dataset/98063) which requires applying for access.
115
- 2. Follow the [TRELLIS](https://github.com/microsoft/TRELLIS) data processing instructions to preprocess the dataset. Make sure to follow their directory structure for compatibility and fully generate the necessary files and ``metadata.csv``.
116
- 3. Run the ``dataset_toolkits/build_metadata_scene.py`` script to create the scene-level metadata file:
117
- ```sh
118
- python dataset_toolkits/build_metadata_scene.py 3D-FUTURE
119
- --output_dir <path_to_3D-FUTURE>
120
- --set <train or test>
121
- --vggt_ckpt checkpoints/VGGT-1B --save_mask
122
- ```
123
- This will generate a `metadata_scene.csv` file or a `metadata_scene_test.csv` file in the specified dataset directory.
124
- 4. For evaluation, run the ``dataset_toolkits/build_scene.sh`` script to render scene image for each scene(with Blender installed and the configs in the script set correctly):
125
- ```sh
126
- bash dataset_toolkits/build_scene.sh
127
- ```
128
- This will create a `scene_test_render` folder in the dataset directory containing the rendered images of the test scenes with Blender, which will be further used for evaluation.
129
  ## 🏋️‍♂️ Training
130
  With the processed 3D-FUTURE dataset and the pretrained `ss_flow_img_dit_L_16l8_fp16.safetensors` model checkpoint from [TRELLIS](https://huggingface.co/microsoft/TRELLIS-image-large) correctly placed in the `checkpoints/scenegen/ckpts` directory, you can train SceneGen using the following command:
131
  ```
132
  bash scripts/train.sh
133
  ```
134
- For detailed training configurations, please refer to `configs/generation/ss_scenegen_flow_img_train.json` and change the parameters as needed.
135
 
136
  ## 🧪 Evaluation
137
- To generate the 3D scenes on the 3D-FUTURE test set using the SceneGen model, use the following command:
138
  ```
139
  bash scenegen_eval.sh
140
  ```
141
- which will use the `scenegen_eval.py` script to generate the normalized scenes.
142
-
143
- To evaluate the trained SceneGen model on the 3D-FUTURE test set, use the following command:
144
  ```
145
  cd evalscene
146
  bash eval_scenegen.sh
147
  ```
148
- Make sure to have the processed 3D-FUTURE dataset and the rendered images in place as described in the Dataset section and the evaluation configs in `evalscene/configs/test/scene_evaluation_scenegen.yaml` set correctly. Then the evaluation script will compute metrics between the normalized generated scenes and the ground truth.
149
-
150
- Some packages used in the evaluation require additional installation. Please install the packages: `torchmetrics`, `lpips`, `clip`, and `probreg` via pip.
151
 
152
  ## 📜 Citation
153
  If you use this code and data for your research or project, please cite:
 
 
 
 
 
 
 
154
  ```
155
- @inproceedings{meng2026scenegen,
156
- author = {Meng, Yanxu and Wu, Haoning and Zhang, Ya and Xie, Weidi},
157
- title = {SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass},
158
- booktitle = {International Conference on 3D Vision 2026},
159
- year = {2026},
160
- }
161
- ```
162
- ## TODO
163
- - [x] Release Paper
164
- - [x] Release Checkpoints & Inference Code
165
- - [x] Release Training Code
166
- - [x] Release Data Processing Code
167
- - [x] Release Evaluation Code
168
 
169
  ## Acknowledgements
170
  Many thanks to the code bases from [TRELLIS](https://github.com/microsoft/TRELLIS), [DINOv2](https://github.com/facebookresearch/dinov2), and [VGGT](https://github.com/facebookresearch/vggt).
171
 
172
  ## Contact
173
- If you have any questions, please feel free to contact [[email protected]](mailto:[email protected]) and [[email protected]](mailto:[email protected]).
 
1
  ---
 
 
2
  language:
3
  - en
4
+ license: mit
5
+ pipeline_tag: image-to-3d
6
+ arxiv: 2508.15769
7
+ tags:
8
+ - 3d
9
+ - scene-generation
10
  ---
11
 
12
  # SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass (3DV 2026)
13
 
14
+ This repository contains the official PyTorch implementation of SceneGen, introduced in [SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass](https://huggingface.co/papers/2508.15769).
15
 
16
  **Now the Training, Inference Code, and Pretrained Models have all been released! Feel free to reach out for discussions!**
17
 
18
  <div align="center">
19
+ <img src="https://github.com/Mengmouxu/SceneGen/raw/main/assets/SceneGen.png" width="800">
20
  </div>
21
 
22
+ ## 🌟 Resources
23
+ [**Project Page**](https://mengmouxu.github.io/SceneGen/) · [**Paper**](https://arxiv.org/abs/2508.15769/) · [**Code**](https://github.com/Mengmouxu/SceneGen) · [**Checkpoints**](https://huggingface.co/haoningwu/SceneGen/)
24
 
25
  ## ⏩ News
26
  - [2025.11] Evaluation code has been released.
 
96
  > 1. Adjust generation parameters (optional).
97
  > 2. Click **"Generate 3D Scene"**.
98
  > 3. Download the generated GLB file when ready.
 
 
 
 
99
 
100
+ [Watch the demo video](https://github.com/user-attachments/assets/d0d53506-70cd-4bd3-a6ab-2f9b5b16f4d8)
 
101
 
102
  ### Pre-segmented Image Inference
103
  This script processes a directory of pre-segmented images.
104
  - **Input**: The input folder structure should be similar to `assets/masked_image_test`, containing segmented scene images.
105
+ - **Visualization**: For scenes with ground truth data, you can use the `--gradio` flag to launch a Gradio interface that visualizes both the ground truth and the generated model.
106
  - **Usage**:
107
  ```sh
108
  python inference.py --gradio
109
  ```
110
 
111
  ## 📚 Dataset
112
+ To train and evaluate SceneGen, we use the [3D-FUTURE](https://tianchi.aliyun.com/dataset/98063) dataset. Please refer to the [GitHub repository](https://github.com/Mengmouxu/SceneGen#dataset) for detailed preprocessing and data handling instructions.
113
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  ## 🏋️‍♂️ Training
115
  With the processed 3D-FUTURE dataset and the pretrained `ss_flow_img_dit_L_16l8_fp16.safetensors` model checkpoint from [TRELLIS](https://huggingface.co/microsoft/TRELLIS-image-large) correctly placed in the `checkpoints/scenegen/ckpts` directory, you can train SceneGen using the following command:
116
  ```
117
  bash scripts/train.sh
118
  ```
 
119
 
120
  ## 🧪 Evaluation
121
+ To generate the 3D scenes on the 3D-FUTURE test set:
122
  ```
123
  bash scenegen_eval.sh
124
  ```
125
+ To evaluate the trained model on the 3D-FUTURE test set:
 
 
126
  ```
127
  cd evalscene
128
  bash eval_scenegen.sh
129
  ```
 
 
 
130
 
131
  ## 📜 Citation
132
  If you use this code and data for your research or project, please cite:
133
+ ```bibtex
134
+ @inproceedings{meng2026scenegen,
135
+ author = {Meng, Yanxu and Wu, Haoning and Zhang, Ya and Xie, Weidi},
136
+ title = {SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass},
137
+ booktitle = {International Conference on 3D Vision 2026},
138
+ year = {2026},
139
+ }
140
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
141
 
142
  ## Acknowledgements
143
  Many thanks to the code bases from [TRELLIS](https://github.com/microsoft/TRELLIS), [DINOv2](https://github.com/facebookresearch/dinov2), and [VGGT](https://github.com/facebookresearch/vggt).
144
 
145
  ## Contact
146
+ If you have any questions, please feel free to contact [[email protected]](mailto:[email protected]) and [[email protected]](mailto:[email protected]).