An advanced multimodal model hosted as a Discord bot that synergizes Mistral as the language model and LLaVA (Large Language and Vision Assistant) as the vision model. Together, they comprise a potent combination, incorporating a vision encoder and Vicuna for comprehensive visual and language understanding. This configuration empowers Mikael with remarkable chat capabilities, mirroring the versatility of the multimodal GPT-4.
Mikael does not require any permissions, it is a chat-only Discord bot:
https://discord.com/api/oauth2/authorize?client_id=1202687794213036112&permissions=0&scope=bot
To combine Mistral's LLM and LLaVA multimodal model into a Discord bot for users to chat with similar to ChatGPT.
Currently, Mistral's 7.3B parameter LLM can:
- Outperform Llama 2 13B on all benchmarks
- Outperform Llama 1 34B on many benchmarks
- Approach CodeLlama 7B performance on code, while remaining good at English tasks
- Use Grouped-query attention (GQA) for faster inference
- Use Sliding Window Attention (SWA) to handle longer sequences at smaller cost
Mikael can be self-hosted by following these steps:
For Linux:
$ curl https://ollama.ai/install.sh | sh
For MacOS:
https://ollama.ai/download/Ollama-darwin.zip
-
$ ollama pull mistral
-
$ ollama pull llava
Mistral needs 4.7 GB, while llava requires 4.1 GB of space.
$ pip install discord.py ollama
$ nvim /path/to/bash-or-zsh
$ export MIKAEL_TOKEN="TOKEN HERE"
$ source /path/to/bash-or-zsh
Mikael should run correctly; open an issue if it does not.
- Incorporate Dolphin-Mixtral-8x7b as Mikael's main LLM.
- Mikael temporarily downloads all images it's sent to be fed into LLaVA's multimodal model; after successfully finishing its task, your images are immediately deleted from the server. (L39-L59)
- Mikael only stores your chats in random access memory (RAM) (L63-L66).
https://discord.gg/JX4XgrQSeV