A program that generates memes based on the input image
In this project, I am using two different approaches to generate meme captions.
- Sentiment-based Caption Generation
User input -> sentiment detection -> prompt LLaMA3-8B-Instruct with sentiment and in-context learning examples.
- Adapters imitating meme styles User input -> sentiment+context detection -> prompt Gemma-2B with customly trained adapters to generate specific genres/formats of memes.
Before you proceed with the code, I recommend having a look at the project report to get a higher-level overview of the whole pipeline.
The structure of this repo is as follows:
meme_caption_generator/
├── notebooks/
├── fonts/ #fonts for meme captions
├── memes_900k_files/ #files used for training
├── my_fun_results/ #some of my results
├── test_images/ #some templates you might want to use
├── utils/
├── requirements.txt
├── run_your_meme.py #a script to get your meme
├── walk_through_notebook.ipynb #a notebook that walks you through the main implementation
└── README.md
- For testing the first approach, please refer to this Colab notebook. Please first make sure you have access to the Drive folder, and then run the Meme_caption_generator.ipynb notebook. It will require:
- an access to mount your drive
- your HuggingFace API token
- For testing the second approach:
- Make sure you have:
- an access to Gemma-2b model - it is open-source
- a HuggingFace API token
- Clone this repository to your Downloads folder:
git clone https://github.com/nursaltyn/meme_caption_generator
You might download it in another folder, however, you might have to adjust some paths in some notebooks. This inconvenience will be fixed in the future.
- Please make sure you have installed the requirements to avoid compatibility problems:
# Windows users
pip install -r requirements.txt
# Mac users
pip install -r requirements_mac.txt
- While you are in the "meme_caption_generator" folder, run:
# Windows users
python run_your_meme.py --img_path YOUR_FULL_IMAGE_PATH --hf_token hf_YOUR_TOKEN
# Mac users
python3 run_your_meme.py --img_path YOUR_FULL_IMAGE_PATH --hf_token hf_YOUR_TOKEN --device mps
- If you are not sure, which image to run, you can use "test_images" folder to get inspiration. Some templates are available there.
- By default, you will see the result meme in the folder "result_memes/gemma". If you want to receive memes in a different folder, you can path it in the optional argument "output_dir":
python run_your_meme.py --img_path YOUR_FULL_IMAGE_PATH --hf_token hf_YOUR_TOKEN --output_dir YOUR_OUTPUT_PATH
You can also explore the web-interface on Streamlit. However, since we are using Gemma model under the hood, and Gemma requires a private API token (although still open-source), we weren't able to launch the application in public. However, you can clone it to your local machine, and run streamlit.
https://huggingface.co/spaces/NursNurs/Meme-caption-generator/tree/main
When you are in the Meme-caption-generator folder, run:
git lfs install
git clone https://huggingface.co/spaces/NursNurs/Meme-caption-generator
pip install streamlit
# it's better to create a new environment to avoid incompatibilities
pip install -r requirements.txt
streamlit run app.py
This is how the interface should look for you:
Warning: we faced issues with the libraries in the requirements.txt when switching from different devices. We apologize for the possible issues; what worked for us was installing the requirements.txt and then manually fixing the libraries that caused problems.
- Since we use Inference API for some HuggingFace models, there might be two potential errors:
- The model isn't loaded yet (usually the first time you prompt it; later this error disappears)
- You've reached the query limit (300 queries/hour).
- In the requirement.txt, some libraries are problematic to download and some are not used later on. We are working on fixing that and sincerely apologize for possible inconveniences.
In the fonts folder, you can load fonts which you want to use for meme generation. The default font we use is Anton.
For training the models, we used memes_900k dataset, which is available here.
- Dataset memes900k: Borovik, Ilya & Khabibullin, Bulat & Kniazev, Vladislav & Pichugin, Zakhar & Olaleke, Oluwafemi. (2020). DeepHumor: Image-Based Meme Generation Using Deep Learning. 10.13140/RG.2.2.14598.14400.
- https://huggingface.co/blog/gemma-peft (Gemma tuning)
- https://medium.com/@samvardhan777/fine-tune-gemma-using-qlora-%EF%B8%8F-6b2f2e76dc55 (Gemma tuning)
- https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing (running Llama efficiently with Unsloth library)