[Paper Review] LeRF

let’s start!

LERF

  • LERF can be reconstructed from a hand-held phone capture within 45 minutes
  • then can render dense relevancy maps given textual queries interactively in real-time

  • the immediate output of NeRFs is nothing but a colorful density field, devoid of meaning or context, which inhibits building interfaces for interacting with the resulting 3D scenes
  • why natural language
    • handle natural language input queries
    • ability to incorporate semantics at multiple scales and relate to long-tail and abstract concepts

How?

  • CLIP without finetuning
  • construct a LERF by optimizing a language field jointly with NeRF
    • which takes both position and physical scale as input
    • and outputs a single CLIP vector
    • supervised
    • CLIP embeddings generated from image crops of training views -> multi-scale feature pyramid
Written on March 28, 2023