NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Abstract
NeoVerse is a scalable 4D world model that enables pose-free reconstruction and novel-trajectory video generation from monocular videos with state-of-the-art performance.
In this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a common limitation of scalability in current 4D world modeling methods, caused either by expensive and specialized multi-view 4D data or by cumbersome training pre-processing. In contrast, our NeoVerse is built upon a core philosophy that makes the full pipeline scalable to diverse in-the-wild monocular videos. Specifically, NeoVerse features pose-free feed-forward 4D reconstruction, online monocular degradation pattern simulation, and other well-aligned techniques. These designs empower NeoVerse with versatility and generalization to various domains. Meanwhile, NeoVerse achieves state-of-the-art performance in standard reconstruction and generation benchmarks. Our project page is available at https://neoverse-4d.github.io
Community
NeoVerse is a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications.
Project page: https://neoverse-4d.github.io
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation (2025)
- FreeGen: Feed-Forward Reconstruction-Generation Co-Training for Free-Viewpoint Driving Scene Synthesis (2025)
- WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling (2025)
- Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment (2025)
- Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion (2025)
- CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model. (2025)
- GeoVideo: Introducing Geometric Regularization into Video Generation Model (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
arXiv lens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/neoverse-enhancing-4d-world-model-with-in-the-wild-monocular-videos-2381-05f8fffd
- Executive Summary
- Detailed Breakdown
- Practical Applications
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper