SIGGRAPH Conference Papers 2026

Relit-LiVE

RELIGHT VIDEO BY JOINTLY LEARNING ENVIRONMENT VIDEO

A physically consistent and temporally stable video relighting framework that jointly predicts relit videos and viewpoint-aligned per-frame warped environment maps, without requiring prior camera poses.

Weiqing Xiao*, Hong Li*, Xiuyu Yang*, Houyuan Chen, Wenyi Li, Tianqi Liu, Shaocong Xu, Chongjie Ye, Hao Zhao, Beibei Wang

* Equal contribution. Corresponding authors.

Nanjing University
BAAI
Beihang University
Tsinghua University

Homepage Showcase

High-Resolution Image and Video Relighting

Supports 1024 x 1472 image relighting and 480 x 832 video relighting over 57-frame sequences, with original input panes marked separately for quick identification.

Project Video

Project Overview

Why Relit-LiVE is different

Joint generation of relighting and environment video

Relit-LiVE combines an RGB-intrinsic fusion renderer with lighting prediction, so relit frames and per-frame warped environment maps are generated together in one process.

Designed around prior failure modes

Direct relighting often breaks spatial consistency or keeps traces of the original illumination, while render-based alternatives usually rely on camera pose priors and accurate intrinsic decomposition.

Architecture Comparison

Two prior routes versus the Relit-LiVE pipeline

Baseline A

Prompt-based and direct relighting

Extracted architecture crop showing the prompt-based and direct video relighting route

Baseline B

Env-maps and render-based relighting

Extracted architecture crop showing the environment-map and render-based relighting route

Relit-LiVE

RGB-intrinsic fusion with warped environment maps

Extracted architecture crop showing the Relit-LiVE architecture with RGB-intrinsic fusion renderer and per-frame warped environment map output

Technical Innovations

Three core ingredients behind the method

01

RGB-intrinsic fusion renderer

Raw reference images are explicitly fused into the rendering pathway so the model can recover global illumination cues and preserve delicate material appearance that intrinsic-only pipelines often lose.

02

Per-frame warped environment map prediction

Relit frames and per-frame warped environment maps are generated in one diffusion process, enforcing strong geometry-light alignment without explicit camera pose supervision.

03

Robust multi-stage training

Latent interpolation synthesizes diverse multi-illumination supervision, while cycle-consistent self-supervised illumination learning improves temporal coherence on real videos.

Temporal Consistency

Relighting that holds up across long sequences

Official figure from the paper showing comparison results for long video relighting sequences, including baseline methods, our relit outputs, and predicted environment maps.

Video Results

High-motion relighting sequences

These high-motion results highlight the joint generation of relit video and per-frame warped environment maps.

01
02
03
04

Visual Showcase

Browse results under different illumination targets

Browse separate image and video showcases across multiple target illuminations, with dedicated cases for still-image relighting and temporally coherent video relighting.

Image Showcase

Representative still-image relighting cases

Input
Relit Output
Selected image relit result
Selected image input frame

Lighting preset

Video Showcase

Representative video relighting cases

Input Video Selected video input preview
Relit Video Selected relit video preview

Lighting preset

Applications

Beyond a single relighting task

Joint modeling of relit video and per-frame warped environment maps also supports downstream tasks that require coherent lighting control, including scene editing and video delighting.

Scene editing results showing original images and edited outputs under relit conditions

Scene Editing

Relit-LiVE supports consistent edits to scene content while preserving geometry, illumination interactions, and overall visual coherence.

Video delighting results comparing source video frames and delighting outputs

Video Delighting

The learned lighting representation can also remove baked-in illumination effects, producing cleaner appearance while maintaining temporal consistency across video frames.

Resources

Paper, supplement, and code