1Simon Fraser University 2Tsinghua University
We introduce multi-slice reasoning, a new notion for single-view 3D reconstruction which challenges the current and prevailing belief that multi-view synthesis is the most natural conduit between single-view and 3D. Our key observation is that object slicing is more advantageous than altering views to reveal occluded structures. Specifically, slicing can peel through any occluder without obstruction, and in the limit (infinitely many slices), it is guaranteed to unveil all hidden object parts. We realize our idea by developing Slice3D, a novel method for single-view 3D reconstruction by which first predicts multi-slice images from a single RGB image and then integrates the slices into a 3D model using a coordinate-based transformer network for signed distance prediction. The slice images can be regressed or generated, both through a U-Net based network. For the former, we inject a learnable slice indicator code to designate each decoded image into a spatial slice location, while the slice generator is a denoising diffusion model operating on the entirety of slice images stacked on the input channels. Our Slice3D can prodoce a 3D mesh from a single view input within only 20 seconds on a NVIDIA A40 GPU.
Our slice images can be regressed/generated in a high level of consitency, which is very challenging for multi-view methods.
Multi-slice vs. multi-view reconstructions amid ambiguities in the chair legs. Both One-2-3-45 (bottom) and Slice3D (top) can produce multiple results. Our results are both plausible from consistent slices, while One-2-3-45 suffers from multi-view inconsistencies.
@article{wang2023slice3d,
title={Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction},
author={Wang, Yizhi and Lira, Wallace and Wang, Wenqi and Mahdavi-Amiri, Ali and Zhang, Hao},
journal={arXiv preprint arXiv:2312.02221},
year={2023}
}