Stable Diffusion

4309 readers

4 users here now

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

founded 1 year ago

MODERATORS

db0@lemmy.dbzer0.com

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance (github.com)

submitted 1 month ago by Even_Adder@lemmy.dbzer0.com to c/stable_diffusion@lemmy.dbzer0.com

0 comments fedilink hide all child comments

Abstract

Recent controllable generation approaches such as FreeControl and Diffusion Self-guidance bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion models without training auxiliary modules. However, these methods optimize the latent embedding for each type of score function with longer diffusion steps, making the generation process time-consuming and limiting their flexibility and use. This work presents Ctrl-X, a simple framework for T2I diffusion controlling structure and appearance without additional training or guidance. Ctrl-X designs feed-forward structure control to enable the structure alignment with a structure image and semantic-aware appearance transfer to facilitate the appearance transfer from a user-input image. Extensive qualitative and quantitative experiments illustrate the superior performance of Ctrl-X on various condition inputs and model checkpoints. In particular, Ctrl-X supports novel structure and appearance control with arbitrary condition images of any modality, exhibits superior image quality and appearance transfer compared to existing works, and provides instant plug-and-play functionality to any T2I and text-to-video (T2V) diffusion model. See our project page for an overview of the results: this https URL

Paper: https://arxiv.org/abs/2406.07540

Code: https://github.com/genforce/ctrl-x

Project Page: https://genforce.github.io/ctrl-x/

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here