Prompt enhancement example for SDXL with Compel and DIffusers (#442)

* add: sdxl-compel notebook * update: colab * clean up --------- Co-authored-by: Thomas Capelle <tcapelle@pm.me>
wandb · Aug 3, 2023 · dddc9dd · dddc9dd
1 parent 0b259fe
commit dddc9dd
Show file tree

Hide file tree

Showing 2 changed files with 295 additions and 2 deletions.
diff --git a/colabs/README.md b/colabs/README.md
@@ -19,12 +19,12 @@
 | Ultralytics Inference | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/ultralytics-inference) |
 | Ray/Tune | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/raytune-colab) |
 | 🤗 Diffusers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/diffusers-uncond-colab) |
-| 🤗 Diffusers Stable Diffusion XL 1.0 Text-to-Image | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/sdxl-colab) |
+| 🧨 Diffusers Stable Diffusion XL 1.0 Text-to-Image | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/sdxl-colab) |
+| Controlling and Enhancing Stable Diffusion Prompts using Compel and 🧨 Diffusers | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/sdxl-compel-colab) |
 | 🧨 Dreambooth-Keras Train | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/db-keras-train) |
 | 🧨 Dreambooth-Keras Inference | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/db-keras-inference) |
 | Kaolin-Wisp | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://wandb.me/vqad-colab) |
 
-
 # 🏋🏽‍♂️ W&B Features
 
 | Notebook    | Link |

diff --git a/colabs/diffusers/sdxl-compel.ipynb b/colabs/diffusers/sdxl-compel.ipynb
@@ -0,0 +1,293 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a href=\"https://colab.research.google.com/github/wandb/examples/blob/master/colabs/diffusers/sdxl-compel.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Prompt Weighing and Blending using for SDXL 1.0 using [Compel](https://github.com/damian0815/compel) and [🧨 Diffusers](https://huggingface.co/docs/diffusers)\n",
+    "\n",
+    "\n",
+    "This notebook demonstrates the following:\n",
+    "- Performing text-conditional image-generations using [🧨 Diffusers](https://huggingface.co/docs/diffusers).\n",
+    "- Using the Stable Diffusion XL Refiner pipeline to further refine the outputs of the base model.\n",
+    "- Manage image generation experiments using [Weights & Biases](http://wandb.ai/geekyrakshit).\n",
+    "- Log the prompts and generated images to [Weigts & Biases](http://wandb.ai/geekyrakshit) for visalization."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Installing the Dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install -qq diffusers[\"torch\"] transformers compel wandb"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import wandb\n",
+    "from diffusers import DiffusionPipeline, EulerDiscreteScheduler\n",
+    "from compel import Compel, ReturnedEmbeddingsType"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Experiment Management using Weights & Biases\n",
+    "\n",
+    "Managing our image generation experiments is crucial for the sake of reproducibility. Hence we sync all the configs of our experiments with our Weights & Biases run. This stores all the configs of the experiments, right from the prompts to the refinement technque and the configuration of the scheduler."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "wandb.init(project=\"stable-diffusion-xl\", entity=\"geekyrakshit\", job_type=\"text-to-image-compel\", save_code=True)\n",
+    "\n",
+    "config = wandb.config\n",
+    "config.stable_diffusion_checkpoint = \"stabilityai/stable-diffusion-xl-base-1.0\"\n",
+    "config.refiner_checkpoint = \"stabilityai/stable-diffusion-xl-refiner-1.0\"\n",
+    "config.offload_to_cpu = False\n",
+    "config.compile_model = False\n",
+    "config.prompt_1 = \"a cat playing with a ball in the (forest)---------\"\n",
+    "config.prompt_2 = \"Realistic, highly detailed, cold and bright color grading, 8k.\"\n",
+    "config.negative_prompt_1 = \"low-quality\"\n",
+    "config.negative_prompt_2 = \"low-quality\"\n",
+    "config.seed = 42\n",
+    "config.use_ensemble_of_experts = False\n",
+    "config.num_inference_steps = 100\n",
+    "config.num_refinement_steps = 150\n",
+    "config.high_noise_fraction = 0.8 # Set explicitly only if config.use_ensemble_of_experts is True\n",
+    "config.scheduler_kwargs = {\n",
+    "    \"beta_end\": 0.012,\n",
+    "    \"beta_schedule\": \"scaled_linear\", # one of [\"linear\", \"scaled_linear\"]\n",
+    "    \"beta_start\": 0.00085,\n",
+    "    \"interpolation_type\": \"linear\", # one of [\"linear\", \"log_linear\"]\n",
+    "    \"num_train_timesteps\": 1000,\n",
+    "    \"prediction_type\": \"epsilon\", # one of [\"epsilon\", \"sample\", \"v_prediction\"]\n",
+    "    \"steps_offset\": 1,\n",
+    "    \"timestep_spacing\": \"leading\", # one of [\"linspace\", \"leading\"]\n",
+    "    \"trained_betas\": None,\n",
+    "    \"use_karras_sigmas\": False,\n",
+    "}\n",
+    "config.prompt_credits = \"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can make the experiment deterministic based on the seed specified in the experiment configs."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if config.seed is not None:\n",
+    "    generator = [torch.Generator(device=\"cuda\").manual_seed(config.seed)]\n",
+    "else:\n",
+    "    generator = [torch.Generator(device=\"cuda\")]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Creating the Diffusion Pipelines\n",
+    "\n",
+    "For performing text-conditional image generation, we use the `diffusers` library to define the diffusion pipelines corresponding to the base SDXL model and the SDXL refinement model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pipe = DiffusionPipeline.from_pretrained(\n",
+    "    config.stable_diffusion_checkpoint,\n",
+    "    torch_dtype=torch.float16,\n",
+    "    variant=\"fp16\",\n",
+    "    use_safetensors=True,\n",
+    "    scheduler=EulerDiscreteScheduler(**config.scheduler_kwargs),\n",
+    ")\n",
+    "\n",
+    "if config.offload_to_cpu:\n",
+    "    pipe.enable_model_cpu_offload()\n",
+    "else:\n",
+    "    pipe.to(\"cuda\")\n",
+    "\n",
+    "if config.compile_model:\n",
+    "    pipe.unet = torch.compile(pipe.unet, mode=\"reduce-overhead\", fullgraph=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if config.prompt_2 == \"\" and config.negative_prompt_2 == \"\":\n",
+    "    base_compel = Compel(\n",
+    "        tokenizer=[pipe.tokenizer, pipe.tokenizer_2],\n",
+    "        text_encoder=[pipe.text_encoder, pipe.text_encoder_2],\n",
+    "        returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,\n",
+    "        requires_pooled=[False, True]\n",
+    "    )\n",
+    "\n",
+    "    base_positive_prompt_embeds, base_positive_prompt_pooled = base_compel(config.prompt)\n",
+    "    base_negative_prompt_embeds, base_negative_prompt_pooled = base_compel(config.negative_prompt)\n",
+    "    base_positive_prompt_embeds, base_negative_prompt_embeds = base_compel.pad_conditioning_tensors_to_same_length([\n",
+    "        base_positive_prompt_embeds, base_negative_prompt_embeds\n",
+    "    ])\n",
+    "else:\n",
+    "    base_compel_1 = Compel(\n",
+    "        tokenizer=pipe.tokenizer,\n",
+    "        text_encoder=pipe.text_encoder,\n",
+    "        returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,\n",
+    "        requires_pooled=False,\n",
+    "    )\n",
+    "\n",
+    "    base_positive_prompt_embeds_1 = base_compel_1(config.prompt_1)\n",
+    "    base_negative_prompt_embeds_1 = base_compel_1(config.negative_prompt_1)\n",
+    "    \n",
+    "    base_compel_2 = Compel(\n",
+    "        tokenizer=pipe.tokenizer_2,\n",
+    "        text_encoder=pipe.text_encoder_2,\n",
+    "        returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,\n",
+    "        requires_pooled=True,\n",
+    "    )\n",
+    "\n",
+    "    base_positive_prompt_embeds_2, base_positive_prompt_pooled = base_compel_2(config.prompt_2)\n",
+    "    base_negative_prompt_embeds_2, base_negative_prompt_pooled = base_compel_2(config.negative_prompt_2)\n",
+    "    \n",
+    "    (\n",
+    "        base_positive_prompt_embeds_2, base_negative_prompt_embeds_2\n",
+    "    ) = base_compel_2.pad_conditioning_tensors_to_same_length([\n",
+    "        base_positive_prompt_embeds_2, base_negative_prompt_embeds_2\n",
+    "    ])\n",
+    "    \n",
+    "    base_positive_prompt_embeds = torch.cat((base_positive_prompt_embeds_1, base_positive_prompt_embeds_2), dim=-1)\n",
+    "    base_negative_prompt_embeds = torch.cat((base_negative_prompt_embeds_1, base_negative_prompt_embeds_2), dim=-1)\n",
+    "    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Text-to-Image Generation\n",
+    "\n",
+    "Now, we pass the embeddings and pooled prompts to the Stable Diffusion XL pipeline."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "image = pipe(\n",
+    "    prompt_embeds=base_positive_prompt_embeds,\n",
+    "    pooled_prompt_embeds=base_positive_prompt_pooled,\n",
+    "    negative_prompt_embeds=base_negative_prompt_embeds,\n",
+    "    negative_pooled_prompt_embeds=base_negative_prompt_pooled,\n",
+    "    output_type=\"pil\",\n",
+    "    num_inference_steps=config.num_inference_steps,\n",
+    "    generator=generator,\n",
+    ").images[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Logging the Images to Weights & Biases\n",
+    "\n",
+    "Now, we log the images to Weights & Biases. This enables us to:\n",
+    "\n",
+    "- Visualize our generations\n",
+    "- Examine the generated images across different images\n",
+    "- Ensure reproducibility of the experiments"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "table = wandb.Table(columns=[\n",
+    "    \"Prompt-1\",\n",
+    "    \"Prompt-2\",\n",
+    "    \"Negative-Prompt-1\",\n",
+    "    \"Negative-Prompt-2\",\n",
+    "    \"Generated-Image\"\n",
+    "])\n",
+    "\n",
+    "image = wandb.Image(image)\n",
+    "\n",
+    "table.add_data(\n",
+    "    config.prompt_1,\n",
+    "    config.prompt_2,\n",
+    "    config.negative_prompt_1,\n",
+    "    config.negative_prompt_2,\n",
+    "    image,\n",
+    ")\n",
+    "wandb.log({\n",
+    "    \"Generated-Image\": image,\n",
+    "    \"Text-to-Image\": table\n",
+    "})\n",
+    "wandb.finish()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here's how you can control your prompts using Compel and manage them using Weights & Biases 👇\n",
+    "\n",
+    "![](https://i.imgur.com/iUQH9XR.png)"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "include_colab_link": true,
+   "provenance": [],
+   "toc_visible": true
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}