Releases: inpho/vsm
Releases · inpho/vsm
v0.2-alpha
Key changes in version v0.2
- A Cythonized version of the collapsed Gibbs sampling loop used by the LDA sequential and multiprocessing models is now used by default for much shorter training times.
- The various methods used to quantify distances between numerical representations of semantic features of data (words, documents, topics) now default to using metric functions. In particular, distances between probability distributions are computed as the Jensen-Shannon distance; other sorts of vectors (e.g., from LSA or from BEAGLE) are compared using angular distance.
vsm.spatial
also includes a wrapper for any distance or similarity function found inscipy.spatial.distance
. - Most of the plotting and clustering functionality has been migrated to an extension
vsm.extension.clustering
, as there are many possibilities in this direction and the core ofvsm
should limit itself to providing a stable source of data for these. - Likewise, the corpus building tools have been migrated to an extension,
vsm.extension.corpusbuilders
. There are many ways to build a corpus and corpus data and metadata arrives in many different forms. The core ofvsm
should limit itself to providing a stable target data structure for the corpus preparation stage of the workflow. - Importing the various classes that
vsm
has provides is now much simplified. In the style ofnumpy
,import vsm
orfrom vsm import *
should drag in most of the commonly used classes and functions.
v0.1-develop
The key differences of the branch from v0.1 are the following:
- The LDA viewer sim_* functions take the similarity or distance function as a parameter. The module includes an implementation of the Jensen-Shannon divergence and this is set as the default for LDA.
- The distance matrix methods return the Manifold object, which facilitates clustering. This branch then also requires sklearn and matplotlib to import the viewer classes.