Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction

CVPR 2024

Yizhi Wang¹, Wallace Lira¹, Wenqi Wang², Ali Mahdavi-Amiri¹, Hao Zhang¹,
¹Simon Fraser University ²Tsinghua University

Paper

Video

Code

Live Demo (To be released)

Abstract

Our single-view 3D reconstruction method, Slice3D, predicts multi-slice images to reveal occluded parts without changing the camera (in contrast to multi-view synthesis), and then lifts the slices into a 3D model.

We introduce multi-slice reasoning, a new notion for single-view 3D reconstruction which challenges the current and prevailing belief that multi-view synthesis is the most natural conduit between single-view and 3D. Our key observation is that object slicing is more advantageous than altering views to reveal occluded structures. Specifically, slicing can peel through any occluder without obstruction, and in the limit (infinitely many slices), it is guaranteed to unveil all hidden object parts. We realize our idea by developing Slice3D, a novel method for single-view 3D reconstruction by which first predicts multi-slice images from a single RGB image and then integrates the slices into a 3D model using a coordinate-based transformer network for signed distance prediction. The slice images can be regressed or generated, both through a U-Net based network. For the former, we inject a learnable slice indicator code to designate each decoded image into a spatial slice location, while the slice generator is a denoising diffusion model operating on the entirety of slice images stacked on the input channels. Our Slice3D can prodoce a 3D mesh from a single view input within only 20 seconds on a NVIDIA A40 GPU.

Multiple instance generation

Multi-slice vs. multi-view reconstructions amid ambiguities in the chair legs. Both One-2-3-45 (bottom) and Slice3D (top) can produce multiple results. Our results are both plausible from consistent slices, while One-2-3-45 suffers from multi-view inconsistencies.