better positions for extracting skip gram feature? #6

JieyuZ2 · 2019-07-17T06:46:40Z

Hi Jiaming,

In the code of extracting skip gram features https://github.com/mickeystroller/HiExpan/blob/master/src/featureExtraction/extractSkipGramFeature.py, the positions of possible skip gram are set as [(-1, 1), (-2, 1), (-3, 1), (-1, 3), (-2, 2), (-1, 2)] (line 30) , but I found when the center word is the first word of a sentence, the positions will actually become (0, 1) instead of (-1, 1) since there is no word before the center word, so maybe we should add positions like (0, 1), (0, 2) . Otherwise, we will see some entities have "a _ problem" feature but do not have "_ problem" feature. It may hurt when "_ problem" become an important feature later. Thanks!

Best,
Jieyu

mickeysjm · 2019-07-17T07:13:11Z

Thanks for this comment. I initially chose to select this six possible skipgrams in order to somehow align with existing literature. You can definitely change to other positions and I think your proposed schedule is very reasonable. You can do a comparative analysis and I am looking forward to seeing some empricial results. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better positions for extracting skip gram feature? #6

better positions for extracting skip gram feature? #6

JieyuZ2 commented Jul 17, 2019

mickeysjm commented Jul 17, 2019

better positions for extracting skip gram feature? #6

better positions for extracting skip gram feature? #6

Comments

JieyuZ2 commented Jul 17, 2019

mickeysjm commented Jul 17, 2019