The aim of this script is to quickly plot text highlights. It was initially written to compare text snippets extracted by different LLMs, and to compare them with those highlighted by manual coders.
The script reads an arbitrary number of .txt
files and visualises them on a single screen:
See below for setup instructions and how to use the script:
This script remains work-in-progress. Please get in touch if you have ideas and/or suggestions on how to make it useful.
So far, the only dependency is matplotlib. Install required dependency with:
pip install -r requirements.txt
You will need to install your favourite (monospace) font, which must be available in .ttf
format. I recommend the 'liberation-mono.ttf' font.
Download as a zip file, unzip, and place a single .ttf
file (e.g., LiberationMono-Regular.ttf
) into the app/
directory. If you prefer a different font and/or type, make sure to amend the entry in the app/config.py
file. In that case, search for and amend the following line:
@dataclass
class Font:
name: str = field(default='app/LiberationMono-Regular.ttf')
...
...
Text files must be stored in a project sub-folder within the main projects/
folder. Consequently, you must create a projects/
folder manually where all project sub-folders will be located:
mkdir projects/
Then create a sub-folder for each project, e.g.,
cd projects/
mkdir test_project/
This is where the script will look for .txt
files to visualise. Depending on the number of files -- and their length! -- the output can be messy as there is currently no way to control the amount of text shown in a single plot. (As a general rule of thumb, I recommend combining no more than 5 files in a single plot.)
Execute the script using the following command (assuming an active virtual environment):
python start.py PROJECT_NAME
Stating a PROJECT_NAME
is mandatory and must match one of the sub_folders in the main projects/
directory.
Use python start.py -h
to list optional arguments.
Currently, the following arguments are supported:
- [
-a
] Anonymise text: You have the option to overwrite each character with a character of your choice. By default, this option is deactivated and text is not anonymised. - [
-s
] Show plot: Show output in separate window, in addition to storing it as a.png
file - [
-m
] Mix colours: Mix colours of multiple highlights (experimental!) - [
-v
] Verbose mode
I am still experimenting with the best way to highlight text snippets and how to plot them. (Look at app/config.py
for the most recent configuration.)
Current syntax: At the moment, I am using the following syntax to highlight text (indicated here by ...
):
- Positive:
<<p>>
...<</p>>
- Negative:
<<n>>
...<</n>>
- Generic:
<<g>>
...<</g>>
Colours: Anything thus marked is highlighted using the following colours:
- Generic:
(243, 235, 8, 155)
(yellow) - Positive:
(8, 243, 15, 155)
(green) - Negative:
(243, 8, 64, 155)
(red)