In this project we are finding the semantic text similarity between 5 different sentence pairs using 5 different sentence-transformers. Later on, we will be giving easily interpretable visual explanations using LIME Explainable AI (XAI). The goal of this project is to understand if LIME can be a useful XAI tool to obtain explanations for semantic similarity results generated by any model. Also, to check if we can use this model to identify ambiguity error.
The 5 sentence pairs are used based on 5 different categories checking their semantic and syntactic similarity.
Sentence 1 | Sentence 2 | Semantic Similarity | Syntactic Similarity |
---|---|---|---|
I love to watch a lot of movies | I hate to watch movies | No | Yes |
I love all animals | Dog is my favorite animal | Yes | No |
I love birds | I love peacocks | Yes | Yes |
Rini is my childhood friend | I recently met Rini | No | No |
I lost my watch | I will watch out for you | No | Yes |
The 5 different models are used based on their performance, speed and size for carrying out semantic similarity task.
Transformer | Performance | Speed | Model Size |
---|---|---|---|
all-MiniLM-L6-v2 | 58.80 | 14200 | 80 MB |
paraphrase-MiniLM-L6-v2 | 52.56 | 14200 | 80 MB |
paraphrase-albert-small-v2 | 52.25 | 5000 | 43 MB |
all-mpnet-base-v2 | 63.30 | 2800 | 420MB |
all-MiniLM-L12-v2 | 59.76 | 7500 | 120MB |
Reference: https://www.sbert.net/docs/pretrained_models.html
The experiments showed that LIME is not an ideal choice to obtain semantic text similarity explainations due to the poor quality of output generated. Although, LIME could successfully identify and highlight ambiguous words which could eventually help us identify ambiguity error. Lime gave the best output results for paraphrase-MiniLM-L6-v2 model
Here is the Output for one of the example sentences shown by the best performing sentence-transformer
A detailed explanation about the model and the research can be found inside the notebooks.