Homunculus Project - Experimental Custom Transformer Architecture

Key features:

Seamless integration with vision encoder. Along with selective RoPE for each image and text embedding sequence.
Internal iteration, making deeper abstraction while keeping the same parameter count.
GeGLU activation function, inspired by Gemma 2 models.
Custom KV-caching, making sure each internal iteration has an independent KV-cache.
BPE tokenizer based on KBBI.
Grouped Query Attention.
PyTorch Lightning implementation.
DeepSpeed and ZeRO-3 integration. Automatically offload the memory overflow into CPU and NVMe.
Finetuning scripts example with LoRA adapters, with and without quantization.
Add BitNet implementation.
Flash Attention implementation.
Speech encoder.
2D and 3D RoPE.
Diffusion Transformer for image detokenization.
Influential token extraction from attention heatmap.
Jupyter notebook example, both for training and finetuning.
Dual license open-source for individuals, paid for commercial uses.

The iterable Transformer model, where the model can rethink its internal cognitive process with an internal confidence score as a guide. Akin of slow thinking mechanism. So this is the simple explanation of how it works:

We put an adjustable parameter to handle internal looping, the default value is 1.
If the loss value is high, this iteration is triggered, with max iterations set to 10.
We train an independent layer to output a confidence score, trained by loss value from the main training process.
When inference, both the next token and confidence scores are outputted and can determine how many iterations are needed for the current inference.

YouTube progress documentation playlist:

First short brief (27 July 2024): https://youtu.be/NjK1BJyhrlI

Soon:

Short-term memory injection.
SageAttention implementation.
Speech generation integration.
Discrete Latent Representation."
Grokfast
Mamba2 block (?).
Kolmogorov Arnold Network (KAN).
Mixture of Experts block.
Fast object detection integration, possibly YOLO or RT-DETR.
OCR model integration.
MIinference.
Pre-train model integration, possibly Gemma 2 since it uses the same activation function.
Citation to all of the papers used as references or inspirations.

UPDATE LICENSE: This software is dual-licensed under the terms of the GNU Affero General Public License (AGPL) and a commercial license. For commercial use, please contact Habibullah Akbar at akbar2habibullah.gmail to obtain a commercial license. Commercial use is defined as any use of the software for financial gain, including but not limited to, selling, licensing, or distributing the software as part of a product or service.

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
data		data
notebook		notebook
output		output
pdf		pdf
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homunculus Project - Experimental Custom Transformer Architecture

About

Languages

License

kreasof-ai/Homunculus-Project

Folders and files

Latest commit

History

Repository files navigation

Homunculus Project - Experimental Custom Transformer Architecture

About

Topics

Resources

License

Stars

Watchers

Forks

Languages