You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the context of my application, I developed a simple profanity checker. I compared it to yours and mine runs 10,000X faster.
Note: What I do not take into consideration are varying swear words (sex, s3x, etc.). However, building the Trie can incorporate these variations. In my context, I do not do it because I check well written book titles and descriptions.
Note 2: I only use a contains-based approach. However, this approach can also use a censor method.
Here is the implementation:
importreclassTrieNode:
def__init__(self):
self.children= {}
self.is_end_of_word=FalseclassProfanityChecker:
def__init__(self):
withopen("profanity.txt", "r") asf:
self.root=self.build_trie(f.read().splitlines())
# Building the trie: O(m * k),# where m is the number of words in the dictionary and k is the average length of words.defbuild_trie(self, dictionary):
root=TrieNode()
forwordindictionary:
node=rootforcharinword:
ifcharnotinnode.children:
node.children[char] =TrieNode()
node=node.children[char]
node.is_end_of_word=Truereturnroot# Searching each word in the text: O(n * k), where n is the number of words in the text.defcontains_profanity(self, text):
defsearch(remaining_text, node=self.root):
fori, charinenumerate(remaining_text):
ifcharinnode.children:
node=node.children[char]
ifnode.is_end_of_wordand (i==len(remaining_text) -1orremaining_text[i+1] in (' ', '-')):
returnTrueelse:
break# Stop searching if the character is not in the triereturnFalse# Remove punctuation and convert to lowercase before tokenizingtext=re.sub(r'[^\w\s]', '', text)
words=text.split()
forwordinwords:
ifsearch(word.lower()):
returnTruereturnFalseif__name__=='__main__':
pc=ProfanityChecker()
print(pc.contains_profanity("my assessment")) # Output: Falseprint(pc.contains_profanity("my ass essment")) # Output: Trueprint(pc.contains_profanity("my ass-essment")) # Output: True
The text was updated successfully, but these errors were encountered:
Hello,
In the context of my application, I developed a simple profanity checker. I compared it to yours and mine runs 10,000X faster.
Note: What I do not take into consideration are varying swear words (sex, s3x, etc.). However, building the
Trie
can incorporate these variations. In my context, I do not do it because I check well written book titles and descriptions.Note 2: I only use a contains-based approach. However, this approach can also use a censor method.
Here is the implementation:
The text was updated successfully, but these errors were encountered: