haoningwu
/

SceneGen

@@ -1,22 +1,26 @@
 ---
-pipeline_tag: image-to-3d
-license: mit
 language:
 - en
 ---
 # SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass (3DV 2026)
-This repository contains the official PyTorch implementation of SceneGen: https://arxiv.org/abs/2508.15769/.
 **Now the Training, Inference Code, and Pretrained Models have all been released! Feel free to reach out for discussions!**
 <div align="center">
-   <img src="./assets/SceneGen.png">
 </div>
-## 🌟 Some Information
-[Project Page](https://mengmouxu.github.io/SceneGen/) · [Paper](https://arxiv.org/abs/2508.15769/) · [Checkpoints](https://huggingface.co/haoningwu/SceneGen/)
 ## ⏩ News
 - [2025.11] Evaluation code has been released.
@@ -92,82 +96,51 @@ This script launches a Gradio web interface for interactive scene generation.
   > 1.  Adjust generation parameters (optional).
   > 2.  Click **"Generate 3D Scene"**.
   > 3.  Download the generated GLB file when ready.
-  >
-  > **💡 Pro Tip:**  Try the examples below to get started quickly!
-https://github.com/user-attachments/assets/d0d53506-70cd-4bd3-a6ab-2f9b5b16f4d8
-*Click the image above to watch the demo video*
 ### Pre-segmented Image Inference
 This script processes a directory of pre-segmented images.
 - **Input**: The input folder structure should be similar to `assets/masked_image_test`, containing segmented scene images.
-- **Visualization**: For scenes with ground truth data, you can use the `--gradio` flag to launch a Gradio interface that visualizes both the ground truth and the generated model. We provide data from the 3D-FUTURE test set as a demonstration.
 - **Usage**:
   ```sh
   python inference.py --gradio
   ```
 ## 📚 Dataset
-To train and evaluate SceneGen, we use the [3D-FUTURE](https://tianchi.aliyun.com/dataset/98063) dataset. Please download and preprocess the dataset as follows:
-1. Download the 3D-FUTURE dataset from [here](https://tianchi.aliyun.com/dataset/98063) which requires applying for access.
-2. Follow the [TRELLIS](https://github.com/microsoft/TRELLIS) data processing instructions to preprocess the dataset. Make sure to follow their directory structure for compatibility and fully generate the necessary files and ``metadata.csv``.
-3. Run the ``dataset_toolkits/build_metadata_scene.py`` script to create the scene-level metadata file:
-    ```sh
-    python dataset_toolkits/build_metadata_scene.py 3D-FUTURE
-    --output_dir <path_to_3D-FUTURE>
-    --set <train or test>
-    --vggt_ckpt checkpoints/VGGT-1B --save_mask
-    ```
-    This will generate a `metadata_scene.csv` file or a `metadata_scene_test.csv` file in the specified dataset directory.
-4. For evaluation, run the ``dataset_toolkits/build_scene.sh`` script to render scene image for each scene(with Blender installed and the configs in the script set correctly):
-    ```sh
-    bash dataset_toolkits/build_scene.sh
-    ```
-    This will create a `scene_test_render` folder in the dataset directory containing the rendered images of the test scenes with Blender, which will be further used for evaluation.
 ## 🏋️‍♂️ Training
 With the processed 3D-FUTURE dataset and the pretrained `ss_flow_img_dit_L_16l8_fp16.safetensors` model checkpoint from [TRELLIS](https://huggingface.co/microsoft/TRELLIS-image-large) correctly placed in the `checkpoints/scenegen/ckpts` directory, you can train SceneGen using the following command:
 ```
 bash scripts/train.sh
 ```
-For detailed training configurations, please refer to `configs/generation/ss_scenegen_flow_img_train.json` and change the parameters as needed.
 ## 🧪 Evaluation
-To generate the 3D scenes on the 3D-FUTURE test set using the SceneGen model, use the following command:
 ```
 bash scenegen_eval.sh
 ```
-which will use the `scenegen_eval.py` script to generate the normalized scenes.
-To evaluate the trained SceneGen model on the 3D-FUTURE test set, use the following command:
 ```
 cd evalscene
 bash eval_scenegen.sh
 ```
-Make sure to have the processed 3D-FUTURE dataset and the rendered images in place as described in the Dataset section and the evaluation configs in `evalscene/configs/test/scene_evaluation_scenegen.yaml` set correctly. Then the evaluation script will compute metrics between the normalized generated scenes and the ground truth.
-Some packages used in the evaluation require additional installation. Please install the packages: `torchmetrics`, `lpips`, `clip`, and `probreg` via pip.
 ## 📜 Citation
 If you use this code and data for your research or project, please cite:
 ```
-   @inproceedings{meng2026scenegen,
-     author    = {Meng, Yanxu and Wu, Haoning and Zhang, Ya and Xie, Weidi},
-     title     = {SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass},
-     booktitle   = {International Conference on 3D Vision 2026},
-     year      = {2026},
-   }
-```
-## TODO
-- [x] Release Paper
-- [x] Release Checkpoints & Inference Code
-- [x] Release Training Code
-- [x] Release Data Processing Code
-- [x] Release Evaluation Code
 ## Acknowledgements
 Many thanks to the code bases from [TRELLIS](https://github.com/microsoft/TRELLIS), [DINOv2](https://github.com/facebookresearch/dinov2), and [VGGT](https://github.com/facebookresearch/vggt).
 ## Contact
-If you have any questions, please feel free to contact [[email protected]](mailto:[email protected]) and [[email protected]](mailto:[email protected]).

 ---
 language:
 - en
+license: mit
+pipeline_tag: image-to-3d
+arxiv: 2508.15769
+tags:
+- 3d
+- scene-generation
 ---
 # SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass (3DV 2026)
+This repository contains the official PyTorch implementation of SceneGen, introduced in [SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass](https://huggingface.co/papers/2508.15769).
 **Now the Training, Inference Code, and Pretrained Models have all been released! Feel free to reach out for discussions!**
 <div align="center">
+   <img src="https://github.com/Mengmouxu/SceneGen/raw/main/assets/SceneGen.png" width="800">
 </div>
+## 🌟 Resources
+[**Project Page**](https://mengmouxu.github.io/SceneGen/) · [**Paper**](https://arxiv.org/abs/2508.15769/) · [**Code**](https://github.com/Mengmouxu/SceneGen) · [**Checkpoints**](https://huggingface.co/haoningwu/SceneGen/)
 ## ⏩ News
 - [2025.11] Evaluation code has been released.
   > 1.  Adjust generation parameters (optional).
   > 2.  Click **"Generate 3D Scene"**.
   > 3.  Download the generated GLB file when ready.
+[Watch the demo video](https://github.com/user-attachments/assets/d0d53506-70cd-4bd3-a6ab-2f9b5b16f4d8)
 ### Pre-segmented Image Inference
 This script processes a directory of pre-segmented images.
 - **Input**: The input folder structure should be similar to `assets/masked_image_test`, containing segmented scene images.
+- **Visualization**: For scenes with ground truth data, you can use the `--gradio` flag to launch a Gradio interface that visualizes both the ground truth and the generated model.
 - **Usage**:
   ```sh
   python inference.py --gradio
   ```
 ## 📚 Dataset
+To train and evaluate SceneGen, we use the [3D-FUTURE](https://tianchi.aliyun.com/dataset/98063) dataset. Please refer to the [GitHub repository](https://github.com/Mengmouxu/SceneGen#dataset) for detailed preprocessing and data handling instructions.
 ## 🏋️‍♂️ Training
 With the processed 3D-FUTURE dataset and the pretrained `ss_flow_img_dit_L_16l8_fp16.safetensors` model checkpoint from [TRELLIS](https://huggingface.co/microsoft/TRELLIS-image-large) correctly placed in the `checkpoints/scenegen/ckpts` directory, you can train SceneGen using the following command:
 ```
 bash scripts/train.sh
 ```
 ## 🧪 Evaluation
+To generate the 3D scenes on the 3D-FUTURE test set:
 ```
 bash scenegen_eval.sh
 ```
+To evaluate the trained model on the 3D-FUTURE test set:
 ```
 cd evalscene
 bash eval_scenegen.sh
 ```
 ## 📜 Citation
 If you use this code and data for your research or project, please cite:
+```bibtex
+@inproceedings{meng2026scenegen,
+  author    = {Meng, Yanxu and Wu, Haoning and Zhang, Ya and Xie, Weidi},
+  title     = {SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass},
+  booktitle   = {International Conference on 3D Vision 2026},
+  year      = {2026},
+}
 ```
 ## Acknowledgements
 Many thanks to the code bases from [TRELLIS](https://github.com/microsoft/TRELLIS), [DINOv2](https://github.com/facebookresearch/dinov2), and [VGGT](https://github.com/facebookresearch/vggt).
 ## Contact
+If you have any questions, please feel free to contact [[email protected]](mailto:[email protected]) and [[email protected]](mailto:[email protected]).