[Paper Review] LeRF
let’s start!
LERF
- LERF can be reconstructed from a hand-held phone capture within 45 minutes
-
then can render dense relevancy maps given textual queries interactively in real-time
- the immediate output of NeRFs is nothing but a colorful density field, devoid of meaning or context, which inhibits building interfaces for interacting with the resulting 3D scenes
- why natural language
- handle natural language input queries
- ability to incorporate semantics at multiple scales and relate to long-tail and abstract concepts
How?
- CLIP without finetuning
- construct a LERF by optimizing a language field jointly with NeRF
- which takes both position and physical scale as input
- and outputs a single CLIP vector
- supervised
- CLIP embeddings generated from image crops of training views -> multi-scale feature pyramid
Written on March 28, 2023