SafeText

Tool to sanitize text to allow for safe distribution of documents from anonymous sources by removing zero-width characters and homoglpyhs.

Individuals attempting to leak an email or other text file face the risk of identification through fingerprinting. Fingerprinting often occurs when the original distributor of the document has embedded some form of a canary. For example, Elon Musk's email in 2008 in response to leaks featured slightly different wording for each employee. This tactic was realized by the employees, and failed. An easier tactic that is also employed, is the presence of nearly invisible changes to the text. SafeText is designed to identify and remove these changes. Specifically this tool will remove homoglyphs, zero-width characters, and other subtle characters. This tool will also attempt to identify unique spelling of words that could give away an individual's location.

Usage

To use SafeText, call:

python safetext.py inputfile

Example output is:

λ python safetext.py TestFile.txt
[*] Cleaning TestFile.txt to TestFile.txt.safe ...
[!] FOUND HOMOGLYPHIC CHARACTER CYRILLIC_large_H ON LINE 1
The message said: "(Н)ey, let's hang out!"
[!] FOUND a SPACE ON LINE # 2
Lorem*Ipsum*Dolor*Sit
[!] WARNING - Use of spelling (colour) that identifies country on line 3
[!] FOUND HOMOGLYPHIC CHARACTER GREEK_B ON LINE 5
[!] FOUND HOMOGLYPHIC CHARACTER GREEK_C ON LINE 5
Subject: (Β)udget (Ϲ)uts
[*] Output file closed

Note: The relevant characters will be underlined - not enclosed by parentheses. SafeText will output to infile.safe.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.gitignore		.gitignore
BRIT_SPELLS		BRIT_SPELLS
LICENSE		LICENSE
README.md		README.md
TestFile.txt		TestFile.txt
TestFile.txt.safe		TestFile.txt.safe
US_SPELLS		US_SPELLS
characters_safetext.py		characters_safetext.py
safetext.py		safetext.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SafeText

Usage

About

Releases

Packages

Contributors 4

Languages

License

DavidJacobson/SafeText

Folders and files

Latest commit

History

Repository files navigation

SafeText

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages