Skip to the content.
$ pip install unreflectanything
Paper API Wiki
Paper Paper Source Code Source Code Huggingface Model Card Colab Colab Contact Contact

Abstract

Specular highlights distort appearance, obscure texture, and hinder geometric reasoning in both natural and surgical imagery. We present UnReflectAnything, an RGB-only framework that removes highlights from a single image by predicting a highlight map together with a reflection-free diffuse reconstruction. The model uses a frozen vision transformer encoder to extract multi-scale features, a lightweight head to localize specular regions, and a token-level inpainting module that restores corrupted feature patches before producing the final diffuse image. To overcome the lack of paired supervision, we introduce a Virtual Highlight Synthesis pipeline that renders physically plausible specularities using monocular geometry, Fresnel-aware shading, and randomized lighting which enables training on arbitrary RGB images with correct geometric structure. UnReflectAnything generalizes across natural and surgical domains where non-Lambertian surfaces and non-uniform lighting create severe highlights and it achieves competitive performance with state-of-the-art results on several benchmarks.

method overview
VHS Process

Physically-Grounded Specular Synthesis

A monocular geometry-aware pipeline based on MoGe-2 that renders Fresnel-modulated specularities from stochastic point-light sources. This enables robust supervision on unaligned RGB imagery by synthesizing physically plausible training pairs without requiring diffuse ground truth.

Token Inpainter

Latent Token-Space Inpainting

A transformer-based architecture designed to reconstruct specularly-corrupted patch tokens directly within the DINOv3 latent space. By leveraging long-range dependencies, the model recovers original diffuse features while maintaining global semantic consistency.

Model Architecture

Foundation Model Integration

Harnessing frozen DINOv3 Vision Transformer backbones to extract high-level semantic priors. This integration provides invariant features that ensure zero-shot generalization across diverse lighting conditions and non-Lambertian material properties.

Qualitative Results

Cross-Domain Robustness

Proven efficacy across disparate visual domains, from unstructured natural scenes to endoscopic surgical environments. The framework effectively suppressed highlights in complex light-matter interactions where conventional methods degrade.

Highlight Map

Unified Supervision Framework

A training strategy combining synthetic highlight rendering with multi-scale loss functions. By integrating token-level and image-level supervision, the model ensures precise highlight localization and seamless diffuse reconstruction.

Citation

@misc{rota2025unreflectanything,
  title={UnReflectAnything: RGB-Only Highlight Removal by Rendering Synthetic Specular Supervision}, 
  author={Alberto Rota and Mert Kiray and Mert Asim Karaoglu and Patrick Ruhkamp and Elena De Momi and Nassir Navab and Benjamin Busam},
  year={2025},
  eprint={2512.09583},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2512.09583}, 
}