Visual Language Semantic Search Engine

A simple semantic search application that takes in a text string and images, passes it through a Visual Language Model to generate text and image-embeddings and provides the most similar image from the image embeddings using K-nearest neighbours search.

Cool things about this project :

Switchable Visual Language Model Encoders via hf transformers. Currently Supporting:
- All CLiP Versions: Tested : "openai/clip-vit-base-patch32"
- All BLiP Versions: Tested : "Salesforce/blip2-opt-2.7b"
- Quantized CLiP versions for resource constrained systems via clip.cpp

Work In Progress

Fast Vector Search on pre-computed embeddings with FAISS

Environment Setup

You would need a conda environment to install the dependencies

sudo apt install miniconda

Create new conda environment and install dependencies

conda env create --name vlss --file=environment.yml
conda activate vlss

Run Streamlit Demo

streamlit run demo.py

Acknowledgement

This project is inspired from my older project on visual place recognition. This project wouldn't have been possible without the existence of the following open-source libraries:

CLiP
BLiP
clip.cpp
FAISS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Visual Language Semantic Search Engine

Environment Setup

Run Streamlit Demo

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Visual Language Semantic Search Engine

Environment Setup

Run Streamlit Demo

Acknowledgement