ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation

¹Tel Aviv University, ²NVIDIA

Abstract

The practical use of text-to-image generation has evolved from simple, monolithic models to complex workflows that combine multiple specialized components. While workflow-based approaches can lead to improved image quality, crafting effective workflows requires significant expertise, owing to the large number of available components, their complex inter-dependence, and their dependence on the generation prompt.

Here, we introduce the novel task of \textit{prompt-adaptive workflow generation}, where the goal is to automatically tailor a workflow to each user prompt.

We propose two LLM-based approaches to tackle this task: a tuning-based method that learns from user-preference data, and a training-free method that uses the LLM to select existing flows. Both approaches lead to improved image quality when compared to monolithic models or generic, prompt-independent workflows. Our work shows that prompt-dependent flow prediction offers a new pathway to improving text-to-image generation quality, complementing existing research directions in the field.

BibTeX

If you find our work useful, please cite our paper:

@misc{gal2024comfygenpromptadaptiveworkflowstexttoimage, title={ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation}, author={Rinon Gal and Adi Haviv and Yuval Alaluf and Amit H. Bermano and Daniel Cohen-Or and Gal Chechik}, year={2024}, eprint={2410.01731}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.01731}, }

ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation

TL;DR: We predict a ComfyUI workflow that matches a user's text-to-image prompt. Generating images with these prompt-specific flows improves quality.

ComfyGen can produce high quality results and generalize to diverse domains. All images were created with SDXL-scale models (no FLUX!)

Abstract

How does it work?

Comparisons

Comparisons on user-created prompts from CivitAI

User study results on user-created prompts from CivitAI

Comparisons on prompts from the GenEval benchmark

GenEval benchmark results

BibTeX