SIGGRAPH Conference Papers 2026

Relit-LiVE

RELIGHT VIDEO BY JOINTLY LEARNING ENVIRONMENT VIDEO

Watch Intro Video Read Research Paper Code

A physically consistent and temporally stable video relighting framework that jointly predicts relit videos and viewpoint-aligned per-frame warped environment maps, without requiring prior camera poses.

Weiqing Xiao^*, Hong Li^*, Xiuyu Yang^*, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao^†, Beibei Wang^†

^* Equal contribution. ^† Corresponding authors.

Nanjing University

BAAI

Beihang University

Tsinghua University

⌄

Homepage Showcase

High-Resolution Image and Video Relighting

Supports 1024 x 1472 image relighting and 480 x 832 video relighting over 57-frame sequences, with original input panes marked separately for quick identification.

Project Video

Project Overview

Why Relit-LiVE is different

Joint generation of relighting and environment video

Relit-LiVE combines an RGB-intrinsic fusion renderer with lighting prediction, so relit frames and per-frame warped environment maps are generated together in one process.

Designed around prior failure modes

Direct relighting often breaks spatial consistency or keeps traces of the original illumination, while render-based alternatives usually rely on camera pose priors and accurate intrinsic decomposition.

Datasets

Synthetic dataset

Built through a large-scale data synthesis workflow with rendered base colors, roughness, metallicness, normals, depth, environment maps, and camera trajectories.

10,000 videos 120 frames each

Real-world dataset

Collected from DL3DV, SpatialVID-HQ, MIT multi-illumination, HQ50K, RemovalBench, and SOBAv2, then filtered with a VLM and annotated with pseudo G-buffers and aligned environment maps.

13,000+ video clips 57 frames per clip 35,000+ HQ images

Architecture Comparison

Two prior routes versus the Relit-LiVE pipeline

Baseline A

Prompt-based and direct relighting

Extracted architecture crop showing the prompt-based and direct video relighting route

Baseline B

Env-maps and render-based relighting

Extracted architecture crop showing the environment-map and render-based relighting route

Relit-LiVE

RGB-intrinsic fusion with warped environment maps

Technical Innovations

Three core ingredients behind the method

RGB-intrinsic fusion renderer

Raw reference images are explicitly fused into the rendering pathway so the model can recover global illumination cues and preserve delicate material appearance that intrinsic-only pipelines often lose.

Per-frame warped environment map prediction

Relit frames and per-frame warped environment maps are generated in one diffusion process, enforcing strong geometry-light alignment without explicit camera pose supervision.

Robust multi-stage training

Latent interpolation synthesizes diverse multi-illumination supervision, while cycle-consistent self-supervised illumination learning improves temporal coherence on real videos.

Temporal Consistency

Relighting that holds up across long sequences

Official long-sequence relighting comparison figure from the paper

Official figure from the paper showing comparison results for long video relighting sequences, including baseline methods, our relit outputs, and predicted environment maps.

Video Results

High-motion relighting sequences

These high-motion results highlight the joint generation of relit video and per-frame warped environment maps.

Visual Showcase

Browse results under different illumination targets

Browse separate image and video showcases across multiple target illuminations, with dedicated cases for still-image relighting and temporally coherent video relighting.

Image Showcase

Representative still-image relighting cases

Input

Relit Output

Lighting preset

Video Showcase

Representative video relighting cases

Input Video Selected video input preview

Relit Video Selected relit video preview

Lighting preset

Applications

Beyond a single relighting task

Joint modeling of relit video and per-frame warped environment maps also supports downstream tasks that require coherent lighting control, including scene editing and video delighting.

Scene Editing

Relit-LiVE supports consistent edits to scene content while preserving geometry, illumination interactions, and overall visual coherence.

Video Delighting

The learned lighting representation can also remove baked-in illumination effects, producing cleaner appearance while maintaining temporal consistency across video frames.

Resources

Paper, supplement, and code

Paper Main paper PDF

Full technical details, experiments, and analysis.

Supplement Supplementary material

Additional experimental details, ablations, and visualizations.

Code Official repository

Project code, updates, and future release details.