From dddc9ddda26bcf8b48cb1f150a5c4d8b70535d55 Mon Sep 17 00:00:00 2001 From: Soumik Rakshit <19soumik.rakshit96@gmail.com> Date: Thu, 3 Aug 2023 17:34:07 +0530 Subject: [PATCH] Prompt enhancement example for SDXL with Compel and DIffusers (#442) * add: sdxl-compel notebook * update: colab * clean up --------- Co-authored-by: Thomas Capelle --- colabs/README.md | 4 +- colabs/diffusers/sdxl-compel.ipynb | 293 +++++++++++++++++++++++++++++ 2 files changed, 295 insertions(+), 2 deletions(-) create mode 100644 colabs/diffusers/sdxl-compel.ipynb diff --git a/colabs/README.md b/colabs/README.md index a1634221..f0d55bba 100644 --- a/colabs/README.md +++ b/colabs/README.md @@ -19,12 +19,12 @@ | Ultralytics Inference | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/ultralytics-inference) | | Ray/Tune | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/raytune-colab) | | ๐Ÿค— Diffusers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/diffusers-uncond-colab) | -| ๐Ÿค— Diffusers Stable Diffusion XL 1.0 Text-to-Image | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/sdxl-colab) | +| ๐Ÿงจ Diffusers Stable Diffusion XL 1.0 Text-to-Image | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/sdxl-colab) | +| Controlling and Enhancing Stable Diffusion Prompts using Compel and ๐Ÿงจ Diffusers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/sdxl-compel-colab) | | ๐Ÿงจ Dreambooth-Keras Train | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/db-keras-train) | | ๐Ÿงจ Dreambooth-Keras Inference | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/db-keras-inference) | | Kaolin-Wisp | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/vqad-colab) | - # ๐Ÿ‹๐Ÿฝโ€โ™‚๏ธ W&B Features | Notebook | Link | diff --git a/colabs/diffusers/sdxl-compel.ipynb b/colabs/diffusers/sdxl-compel.ipynb new file mode 100644 index 00000000..4920efd5 --- /dev/null +++ b/colabs/diffusers/sdxl-compel.ipynb @@ -0,0 +1,293 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\"Open\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Prompt Weighing and Blending using for SDXL 1.0 using [Compel](https://github.com/damian0815/compel) and [๐Ÿงจ Diffusers](https://huggingface.co/docs/diffusers)\n", + "\n", + "\n", + "This notebook demonstrates the following:\n", + "- Performing text-conditional image-generations using [๐Ÿงจ Diffusers](https://huggingface.co/docs/diffusers).\n", + "- Using the Stable Diffusion XL Refiner pipeline to further refine the outputs of the base model.\n", + "- Manage image generation experiments using [Weights & Biases](http://wandb.ai/geekyrakshit).\n", + "- Log the prompts and generated images to [Weigts & Biases](http://wandb.ai/geekyrakshit) for visalization." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Installing the Dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install -qq diffusers[\"torch\"] transformers compel wandb" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "import wandb\n", + "from diffusers import DiffusionPipeline, EulerDiscreteScheduler\n", + "from compel import Compel, ReturnedEmbeddingsType" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Experiment Management using Weights & Biases\n", + "\n", + "Managing our image generation experiments is crucial for the sake of reproducibility. Hence we sync all the configs of our experiments with our Weights & Biases run. This stores all the configs of the experiments, right from the prompts to the refinement technque and the configuration of the scheduler." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "wandb.init(project=\"stable-diffusion-xl\", entity=\"geekyrakshit\", job_type=\"text-to-image-compel\", save_code=True)\n", + "\n", + "config = wandb.config\n", + "config.stable_diffusion_checkpoint = \"stabilityai/stable-diffusion-xl-base-1.0\"\n", + "config.refiner_checkpoint = \"stabilityai/stable-diffusion-xl-refiner-1.0\"\n", + "config.offload_to_cpu = False\n", + "config.compile_model = False\n", + "config.prompt_1 = \"a cat playing with a ball in the (forest)---------\"\n", + "config.prompt_2 = \"Realistic, highly detailed, cold and bright color grading, 8k.\"\n", + "config.negative_prompt_1 = \"low-quality\"\n", + "config.negative_prompt_2 = \"low-quality\"\n", + "config.seed = 42\n", + "config.use_ensemble_of_experts = False\n", + "config.num_inference_steps = 100\n", + "config.num_refinement_steps = 150\n", + "config.high_noise_fraction = 0.8 # Set explicitly only if config.use_ensemble_of_experts is True\n", + "config.scheduler_kwargs = {\n", + " \"beta_end\": 0.012,\n", + " \"beta_schedule\": \"scaled_linear\", # one of [\"linear\", \"scaled_linear\"]\n", + " \"beta_start\": 0.00085,\n", + " \"interpolation_type\": \"linear\", # one of [\"linear\", \"log_linear\"]\n", + " \"num_train_timesteps\": 1000,\n", + " \"prediction_type\": \"epsilon\", # one of [\"epsilon\", \"sample\", \"v_prediction\"]\n", + " \"steps_offset\": 1,\n", + " \"timestep_spacing\": \"leading\", # one of [\"linspace\", \"leading\"]\n", + " \"trained_betas\": None,\n", + " \"use_karras_sigmas\": False,\n", + "}\n", + "config.prompt_credits = \"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can make the experiment deterministic based on the seed specified in the experiment configs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if config.seed is not None:\n", + " generator = [torch.Generator(device=\"cuda\").manual_seed(config.seed)]\n", + "else:\n", + " generator = [torch.Generator(device=\"cuda\")]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Creating the Diffusion Pipelines\n", + "\n", + "For performing text-conditional image generation, we use the `diffusers` library to define the diffusion pipelines corresponding to the base SDXL model and the SDXL refinement model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipe = DiffusionPipeline.from_pretrained(\n", + " config.stable_diffusion_checkpoint,\n", + " torch_dtype=torch.float16,\n", + " variant=\"fp16\",\n", + " use_safetensors=True,\n", + " scheduler=EulerDiscreteScheduler(**config.scheduler_kwargs),\n", + ")\n", + "\n", + "if config.offload_to_cpu:\n", + " pipe.enable_model_cpu_offload()\n", + "else:\n", + " pipe.to(\"cuda\")\n", + "\n", + "if config.compile_model:\n", + " pipe.unet = torch.compile(pipe.unet, mode=\"reduce-overhead\", fullgraph=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if config.prompt_2 == \"\" and config.negative_prompt_2 == \"\":\n", + " base_compel = Compel(\n", + " tokenizer=[pipe.tokenizer, pipe.tokenizer_2],\n", + " text_encoder=[pipe.text_encoder, pipe.text_encoder_2],\n", + " returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,\n", + " requires_pooled=[False, True]\n", + " )\n", + "\n", + " base_positive_prompt_embeds, base_positive_prompt_pooled = base_compel(config.prompt)\n", + " base_negative_prompt_embeds, base_negative_prompt_pooled = base_compel(config.negative_prompt)\n", + " base_positive_prompt_embeds, base_negative_prompt_embeds = base_compel.pad_conditioning_tensors_to_same_length([\n", + " base_positive_prompt_embeds, base_negative_prompt_embeds\n", + " ])\n", + "else:\n", + " base_compel_1 = Compel(\n", + " tokenizer=pipe.tokenizer,\n", + " text_encoder=pipe.text_encoder,\n", + " returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,\n", + " requires_pooled=False,\n", + " )\n", + "\n", + " base_positive_prompt_embeds_1 = base_compel_1(config.prompt_1)\n", + " base_negative_prompt_embeds_1 = base_compel_1(config.negative_prompt_1)\n", + " \n", + " base_compel_2 = Compel(\n", + " tokenizer=pipe.tokenizer_2,\n", + " text_encoder=pipe.text_encoder_2,\n", + " returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,\n", + " requires_pooled=True,\n", + " )\n", + "\n", + " base_positive_prompt_embeds_2, base_positive_prompt_pooled = base_compel_2(config.prompt_2)\n", + " base_negative_prompt_embeds_2, base_negative_prompt_pooled = base_compel_2(config.negative_prompt_2)\n", + " \n", + " (\n", + " base_positive_prompt_embeds_2, base_negative_prompt_embeds_2\n", + " ) = base_compel_2.pad_conditioning_tensors_to_same_length([\n", + " base_positive_prompt_embeds_2, base_negative_prompt_embeds_2\n", + " ])\n", + " \n", + " base_positive_prompt_embeds = torch.cat((base_positive_prompt_embeds_1, base_positive_prompt_embeds_2), dim=-1)\n", + " base_negative_prompt_embeds = torch.cat((base_negative_prompt_embeds_1, base_negative_prompt_embeds_2), dim=-1)\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Text-to-Image Generation\n", + "\n", + "Now, we pass the embeddings and pooled prompts to the Stable Diffusion XL pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "image = pipe(\n", + " prompt_embeds=base_positive_prompt_embeds,\n", + " pooled_prompt_embeds=base_positive_prompt_pooled,\n", + " negative_prompt_embeds=base_negative_prompt_embeds,\n", + " negative_pooled_prompt_embeds=base_negative_prompt_pooled,\n", + " output_type=\"pil\",\n", + " num_inference_steps=config.num_inference_steps,\n", + " generator=generator,\n", + ").images[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Logging the Images to Weights & Biases\n", + "\n", + "Now, we log the images to Weights & Biases. This enables us to:\n", + "\n", + "- Visualize our generations\n", + "- Examine the generated images across different images\n", + "- Ensure reproducibility of the experiments" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "table = wandb.Table(columns=[\n", + " \"Prompt-1\",\n", + " \"Prompt-2\",\n", + " \"Negative-Prompt-1\",\n", + " \"Negative-Prompt-2\",\n", + " \"Generated-Image\"\n", + "])\n", + "\n", + "image = wandb.Image(image)\n", + "\n", + "table.add_data(\n", + " config.prompt_1,\n", + " config.prompt_2,\n", + " config.negative_prompt_1,\n", + " config.negative_prompt_2,\n", + " image,\n", + ")\n", + "wandb.log({\n", + " \"Generated-Image\": image,\n", + " \"Text-to-Image\": table\n", + "})\n", + "wandb.finish()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's how you can control your prompts using Compel and manage them using Weights & Biases ๐Ÿ‘‡\n", + "\n", + "![](https://i.imgur.com/iUQH9XR.png)" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "include_colab_link": true, + "provenance": [], + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}