- The "I didn't come here to be a stenographer" ASR AI, powered by pretrained models from OpenAI
- Windows
- Python 3.11
- Venv (Should be included with the Python install, otherwise:
pip install venv
) - Cuda-enabled graphics card (Only necessary if you want it to be.. not slow.)
- Cuda toolkit
If there has been an update, there's a possibility that some dependencies changed, if you're unsure you can run conda env update
to update any dependency changes that has been made.
There are a couple of steps, but it should be a fairly quick job. This guide assumes you are at the root of the repository when running commands.
winget install Python.Python.3.11
winget install Nvidia.CUDA
python -m venv .venv
.venv/Scripts/activate
python -m pip install -r requirements.txt
We suggest using the terminal inside Visual Studio Code, but if you have a preferred way of command-line usage - go right ahead.
Take your audio file, (only .wav files are supported as of right now), and place it somewhere you can easily copy or remember the file path.
To transcribe the audio, use python main.py path/to/file.wav
. This runs Crokket in direct invokation mode, skipping the interactive part and performing slightly better in terms of speed. (Be sure to use paths that look/something/like/this.wav
and not\like\this.wav
, there's no support for backslash paths yet)
If you would prefer to use the graphical interface, just run python main.py
and press enter on the message about using the GUI.
The transcript with the same name as your audio file can be found under the data folder in this directory.
NOTE: If it is the first time you launch the program, it will download around 3 GB (assuming you're using the default medium
model).
A somewhat modern consumer computer with a 30-series or newer graphics card is expected to transcribe in the ranges of 4-6 hours of audio per hour of runtime. The output is of course not perfect but it should save you some time.
ALWAYS CHECK YOUR OUTPUT AND MAKE THE NECESSARY CORRECTIONS
Best of luck.