Skip to content

pemagrg1/word-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Created Date: 12Feb 2019

word-embeddings

Embedding is the process of converting a word or a piece of text to a continuos vector space of real number, usually in low dimension.

In this repository, we have used Gensim's Word2Vec, fastText, GloVe.a

Gensim

Gensim is an open-source library for unsupervised topic modeling and natural language processing, using modern statistical machine learning. Gensim is implemented in Python and Cython.
Developed by RARE Technologies Ltd.
Download the pretrained model from: https://github.com/RaRe-Technologies/gensim-data

GloVe

GloVe, coined from Global Vectors, is a model for distributed word representation. The model is an unsupervised learning algorithm for obtaining vector representations for words. This is achieved by mapping words into a meaningful space where the distance between words is related to semantic similarity.
It is developed as an open-source project at Stanford
Download the pretrained model from: https://github.com/stanfordnlp/GloVe

fastText

fastText is a library for learning of word embeddings and text classification created by Facebook's AI Research lab. The model allows to create an unsupervised learning or supervised learning algorithm for obtaining vector representations for words.
download the pretrained model from: https://github.com/facebookresearch/fastText/blob/master/docs/pretrained-vectors.md

TO DO

  1. Text similarity using embeddings
  2. Text classification using embeddings
  3. Embeddings Visualization