- Raw dataset (PDF): RUU Cipta Kerja
- QA-Generated pairs: SFT data prep Notebook
- Model base: "sarahlintang/mistral-indo-7b"
- Fine Tuned model: https://huggingface.co/Willy030125/CiptakerLM-v1
- RAG model: "llama3.1"
- Embedding model: "nomic-embed-text"
- RAG library: LangChain with Unstructured PDF Loader
- Notebook: here
- Pre-requisites: Node version v20.17.0
cd FE
npm install
npm start
cd FE
npm install -g serve
serve -s build
To run the backend:
cd BE
python app.py
If hosted on a different PC, you may need a public IP or tunneling. Read more about tunneling here:
- Ready to be run on Google Colab. To localize the pip freeze for each notebook, paste it into requirements.txt. For the complete guide (recommended to use Conda):
- Setting up a Conda environment in less than 5 minutes
- Cosine similarity: 0.40
- Perplexity: 1.0561115741729736
- ROUGE: 0.7134693037488239
- BLEU: 0.6164010763168335