Lawful Diffusion

Website | Email | Docs (soon)

This implementation provides a comprehensive framework for integrating ~~Stable Diffusion~~ Diffusion Transformer Model with a retrieval-based attribution system using PyTorch, Hugging Face's Diffusers, CLIP, and InternViT. By encoding and indexing the training dataset's images, the system can both attribute generated images and verify external images against the training data.

This approach promotes transparency and accountability in generative models, addressing concerns related to copyright and artist attribution. It serves as a foundation that can be further refined and expanded based on specific requirements and datasets.

Goals

Building a sophisticated generative model architecture that integrates Diffusion Transformer Model with a retrieval-based attribution system involves several components. This system will not only generate images based on text prompts but also provide attributions to the artists or data sources that most closely align with the generated content. Additionally, it will offer verification capabilities for external images against the training dataset.

A training pipeline that allows a generative model like FLUX or AuraFlow to output the nearest artist reference text based on CLIP + ViT embeddings and autoencoder (VAE) embeddings involves the following key steps:

Data Preparation: Organize dataset with images and associated artist labels.
Embedding Extraction:
- CLIP Embeddings: Encode images using the CLIP model.
- InternViT Embeddings: Encode images using the ViT model.
- Autoencoder (VAE) Embeddings: Extract latent representations from the VAE part of the Diffusion Transformer model.
Combining Embeddings: Merge CLIP and VAE embeddings to create a comprehensive representation.
Label Encoding: Encode artist labels for training.
Model Training: Train a classifier (e.g., a neural network) to predict artist labels based on combined embeddings.
Integration with Generation Pipeline: Enhance the image generation process to output artist references alongside generated images.

Architecture Diagram

How it works

We finetune generative model with new image datasets
We extract VAE embedding from the finetuned model regarding the image datasets
We extract ViT and CLIP embedding from pretrained model (CLIP and InternViT) regarding the image datasets
We collect embedding from both sources and stored in datasets
We train the classifier from collected embedding
The classiffier have multiple output heads, to easily scale with many artists label in the future.
The generative model and classifier trained separately (They're two different models but work in the same inference pipeline)

Notes:

ViT and CLIP parameter is freezed according to the image datasets, but the VAE checkpoint is updated.
Every image is transformed into 1024*1024 for input uniformity into the classifier
Total parameter count (if we are using FLUX model, 1024*1024 image size transformation, and 1,000,000 number of artists): 688595244~ (688M Parameters, exclude the pretrained and FLUX models)

Publication

We published our early website and first article, kindly check out this article named Bridging Generative AI and Artistic Integrity: A New Era of Creative Collaboration
We publish brief video explanation regarding the project [Link]

Future updates

Experimental implementation with WikiArt Public dataset
Integration with Glaze image cloaking algorithm for further protection of artists' artwork.
~~Add AuraFlow variant, because FLUX-schnell is distilled model and FLUX-dev is not included with commercial license.~~ (completed)
~~FLUX.1-schnell implementation rather than stable diffusion~~ (completed)
~~Use bigger ViT model such as InternViT-6B~~ (completed)

Important Notes

We are looking for research sponsor/investor. Please email Akbar2habibullah@gmail.com if you are interested in sponsoring this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Lawful Diffusion

Goals

Architecture Diagram

How it works

Publication

Future updates

Important Notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

Lawful Diffusion

Goals

Architecture Diagram

How it works

Publication

Future updates

Important Notes