CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting
Abstract
CLIPGaussians is a style transfer framework that supports text- and image-guided stylization of 2D images, videos, 3D objects, and 4D scenes by optimizing color and geometry directly on Gaussian primitives.
Gaussian Splatting (GS) has recently emerged as an efficient representation for rendering 3D scenes from 2D images and has been extended to images, videos, and dynamic 4D content. However, applying style transfer to GS-based representations, especially beyond simple color changes, remains challenging. In this work, we introduce CLIPGaussians, the first unified style transfer framework that supports text- and image-guided stylization across multiple modalities: 2D images, videos, 3D objects, and 4D scenes. Our method operates directly on Gaussian primitives and integrates into existing GS pipelines as a plug-in module, without requiring large generative models or retraining from scratch. CLIPGaussians approach enables joint optimization of color and geometry in 3D and 4D settings, and achieves temporal coherence in videos, while preserving a model size. We demonstrate superior style fidelity and consistency across all tasks, validating CLIPGaussians as a universal and efficient solution for multimodal style transfer.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians (2025)
- Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles (2025)
- GT^2-GS: Geometry-aware Texture Transfer for Gaussian Splatting (2025)
- NeuSEditor: From Multi-View Images to Text-Guided Neural Surface Edits (2025)
- ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior (2025)
- Advancing high-fidelity 3D and Texture Generation with 2.5D latents (2025)
- Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper