Semantic Similarity in Texts

Overview

In distributed work environments, it's common for multiple individuals to work on the same topic or dataset, resulting in the need to collate insights and inputs from various sources. Manually identifying and removing duplicate points from a corpus of sentences can be time-consuming and error-prone. To streamline this process, I developed a program that leverages Cohere embeddings to automatically identify and eliminate duplicate points among a collection of sentences.

This program calculates the semantic similarity (similarity in meaning) between sentences and outputs a similarity percentage. A similarity percentage of 100% indicates exact similarity in meaning. By utilizing this similarity metric, the tool helps identify and flag duplicate points, contributing to enhanced productivity and streamlined content aggregation.

Features

Automatic identification of duplicate points among a set of sentences.
Utilizes Cohere embeddings to measure semantic similarity.
Outputs a similarity percentage to quantify the degree of similarity.
Enhances productivity by reducing the need for manual duplicate removal.
Easy integration into your existing workflow.

Getting Started and Usage

Visit the live demo.
Upload your Excel file containing sentences in a column named 'Text'.
Let the program calculate semantic similarity and generate a similarity matrix.
Review the similarity percentages to identify and address duplicate points.

Example

Input Sentences:

"The quick brown fox jumps over the lazy dog."
"A fast brown fox jumps over a lazy canine."
"An agile fox leaps over the inactive dog."

Output Similarity Matrix:

	Sentence 1	Sentence 2	Sentence 3
Sentence 1	100.00	82.53	78.90
Sentence 2	82.53	100.00	84.22
Sentence 3	78.90	84.22	100.00

Contribution

Contributions are welcome! If you find any issues or have suggestions, feel free to submit a pull request or create an issue.

License

This project is licensed under the MIT License.

Disclaimer: This project is for educational purposes and not intended for production use.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Procfile		Procfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Similarity in Texts

Overview

Features

Getting Started and Usage

Example

Contribution

License

About

Releases

Packages

Languages

shubham13596/Semantic-similarity-in-texts

Folders and files

Latest commit

History

Repository files navigation

Semantic Similarity in Texts

Overview

Features

Getting Started and Usage

Example

Contribution

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages