Feature request: kernels for windows #2

lmcinnes · 2018-08-19T21:22:54Z

This is a incredibly useful library already -- just what I've been looking for. I wanted to suggest a slightly crazy feature request: allowing kernels to be applied to counts within a window.

I'll try to explain what I mean since I don't think I have the right terminology for any of this. In the actual GloVe implementation rather than adding 1 to the count for each co-occurence they add a floating point value of 1/d where d is the distance (in terms of number intervening words/tokens) between the co-occuring words. In the word2vec implementation they use variable window sizes which, in the limit, amounts to having a linear decay of weight around the central word.

It would be very interesting to allow for some sort of arbitrary kernel shape to be used instead of just 1/x or triangular kernels. In particular a non-symmetric kernel might be very interesting (e.g words before a given word may be given more count weight than words following).

I believe that this may be hard to do within the current approach -- I'm not sure it plays nicely with using the CountVectorizer from sklearn in particular. For that reason I'm not expecting this to get implemented any time soon; I just wanted to record the idea while I'm thinking of it so you can look into the possibility if you ever get the time.

BayesForDays self-assigned this Aug 20, 2018

BayesForDays added the enhancement label Aug 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: kernels for windows #2

Feature request: kernels for windows #2

lmcinnes commented Aug 19, 2018

Feature request: kernels for windows #2

Feature request: kernels for windows #2

Comments

lmcinnes commented Aug 19, 2018