BACON: Band-limited coordinate networks for multiscale scene representation

David B. Lindell, Dave Van Veen, Jeong Joon Park, Gordon Wetzstein


image-alt

A new network architecture with an analytical Fourier spectrum and band-limited outputs.

Video



Abstract


Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily analyzed, and their behavior at unsupervised points is difficult to predict. Moreover, these networks are typically trained to represent a signal at a single scale, and so naive downsampling or upsampling results in artifacts. We introduce band-limited coordinate networks (BACON), a network architecture with an analytical Fourier spectrum. BACON has predictable behavior at unsupervised points, can be designed based on the spectral characteristics of the represented signal, and can represent signals at multiple scales without explicit supervision. We demonstrate BACON for multiscale neural representation of images, radiance fields, and 3D scenes using signed distance functions and show that it outperforms conventional single-scale coordinate networks in terms of interpretability and quality.

BACON Framework



BACON Architecture. Our architecture builds on recently proposed Multiplicative Filter Networks, which use Hadamard products between sine nonlinearities and linear layers. Our work adds significant improvements to this architecture enabling an interpretable and adjustable Fourier spectrum, multiscale outputs, and an initialization scheme that prevents vanishingly small activations in deep networks.


BACON Frequency Bandwidth. The bandwidth of each output of the network can be described by analyzing the distribution of frequencies from the sine non-linearities at each output layer. We initialize these frequencies from a random uniform distribution, which allows us to upper bound the maximum frequency represented by the network.


Semi-supervised multiscale decomposition. Since the network outputs are band-limited by construction, they can be supervised at a high-resolution output scale and the network automatically learns a multiscale decomposition.


Extrapolation Behavior. The discrete frequencies used in the network result in a periodic signal representation. We fit the network to this seamless texture by supervising on coordinates within the red region. Then, querying the network outside the domain results in periodic extrapolation.


Initialization Scheme. To facilitate training of deep networks, we introduce a new initialization scheme that maintains standard normal activations throughout the network (bottom). This alleviates a problem with the previously proposed initialization scheme (top) whose activations become vanishingly small at deeper layers.

1D Fitting Example


1D Fitting Result. Other representations (SIREN, Fourier Features) are not band limited and have spurious high frequency components when fitting a simple 1D signal (orange) at a sparse set of supervised points (pink). BACON correctly interpolates between the supervised points (bottom middle) and we can also apply a low-pass filter (bottom row) to fit low-frequency components.

Image Fitting


Image Fitting Result. We compare BACON to Fourier Features, SIREN, and the integrated positional encoding of Mip-NeRF for fitting an image at 256×256 resolution. Fourier Features and SIREN show aliasing when downsampled. Mip-NeRF is explicitly trained at multiple scales and learns anti-aliasing. All methods except BACON show artifacts when upsampling the network at 4x resolution. BACON is supervised at a single scale and learns band-limited outputs that closely matche a low-pass filtered reference (see left column, Fourier spectra insets).

Neural Radiance Fields



Neural Radiance Fields Results. Comparison between NeRF, Mip-NeRF, and BACON. BACON outperforms NeRF for multiscale representation while using fewer parameters than Mip-NeRF to represent low resolution outputs.

3D Shape Representation




3D Shape Representation Results. Results on fitting networks to signed-distance functions of the Thai Statue and Lucy scenes from the Stanford 3D Scanning Repository. Outputs are shown for Neural Geometric Level of Detail (NGLOD), Fourier Features, SIREN, and BACON. All methods perform similarly at the highest detail output, but BACON learns a smooth multiscale decomposition of the shape. Insets show the Fourier spectra of the extracted signed-distance functions, revealing the band-limited output of BACON.

Acknowledgments


This project was supported in part by a PECASE by the ARO and NSF award 1839974.

Citation



@article{lindell2021bacon,
author = {Lindell, David B. and Van Veen, Dave and Park, Jeong Joon and Wetzstein, Gordon},
title = {BACON: Band-limited coordinate networks for multiscale scene representation},
journal = {arXiv preprint arXiv:0000.00000},
year={2021}
}