Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction

CVPR 2024


Yizhi Wang1, Wallace Lira1, Wenqi Wang2, Ali Mahdavi-Amiri1, Hao Zhang1,

1Simon Fraser University    2Tsinghua University    

Abstract


Our single-view 3D reconstruction method, Slice3D, predicts multi-slice images to reveal occluded parts without changing the camera (in contrast to multi-view synthesis), and then lifts the slices into a 3D model.

We introduce multi-slice reasoning, a new notion for single-view 3D reconstruction which challenges the current and prevailing belief that multi-view synthesis is the most natural conduit between single-view and 3D. Our key observation is that object slicing is more advantageous than altering views to reveal occluded structures. Specifically, slicing can peel through any occluder without obstruction, and in the limit (infinitely many slices), it is guaranteed to unveil all hidden object parts. We realize our idea by developing Slice3D, a novel method for single-view 3D reconstruction by which first predicts multi-slice images from a single RGB image and then integrates the slices into a 3D model using a coordinate-based transformer network for signed distance prediction. The slice images can be regressed or generated, both through a U-Net based network. For the former, we inject a learnable slice indicator code to designate each decoded image into a spatial slice location, while the slice generator is a denoising diffusion model operating on the entirety of slice images stacked on the input channels. Our Slice3D can prodoce a 3D mesh from a single view input within only 20 seconds on a NVIDIA A40 GPU.


Results on Objaverse dataset



Results on ShapeNet dataset



Results on Google Scanned Objects (GSO) dataset



Consistent Slices


Description of the image

Our slice images can be regressed/generated in a high level of consitency, which is very challenging for multi-view methods.


Multiple instance generation


Description of the image

Multi-slice vs. multi-view reconstructions amid ambiguities in the chair legs. Both One-2-3-45 (bottom) and Slice3D (top) can produce multiple results. Our results are both plausible from consistent slices, while One-2-3-45 suffers from multi-view inconsistencies.


Citation


@article{wang2023slice3d,
    title={Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction},
    author={Wang, Yizhi and Lira, Wallace and Wang, Wenqi and Mahdavi-Amiri, Ali and Zhang, Hao},
    journal={arXiv preprint arXiv:2312.02221},
    year={2023}
}