Skip to content

Releases: georgian-io/Multimodal-Toolkit

Multimodal Toolkit v0.4.0

24 Sep 19:15
c39bd85
Compare
Choose a tag to compare

Features:

  • CategoricalFeatures now uses a fit(), transform(), and fit_transform() method.
  • Created a new NumericalFeatures object with the same functions as above for consistency in use.
  • Decoupled CategoricalFeatures and the dataset. The object is now independent of dataset and performs transformations based on information from the fit() step. It can now be used separately for inference.
  • Resolve #76 by saving numerical and categorical transformers for inference usage.
  • Updated NaN handling for both categorical and numerical features. Users can specify if NaNs should be handled and what they should be replaced by. Numerical features can be replaced by the median, mean or a custom value while categorical features can be replaced by a custom value only. Also resolves #69
  • Resolve #66 by adding handle_unknown argument for OneHotEncoders in the config.
  • Add a new inference.py script to showcase how to use the saved feature transformers.
  • Update default types for several variables such as categorical_cols and label_list to use lists instead of None.
  • Class weights have been removed from the dataset and preprocessing sections. This was not usable and even when it was set, it resulted in errors. Instead it is now a parameter in TabularConfig and is used by the model in the forward() call.
  • Update tests & main.py to support new features.
  • Update test configuration to reduce the maximum token length - this speeds up the testing and also prevents certain models with lower sequence lengths from throwing an error due to an unsupported sequence length.
  • Argument classes are now part of the library, no need to redefine them each time.
  • Add a .gitignore file
  • Change license to Apache 2.0

Fixes:

  • Add a note to the example notebook to address #71
  • #61 (thanks @DougTrajano!)
  • #62 (thanks @DougTrajano!)
  • Add importlib-metadata to setup.py as there was a dependency error without it.
  • Reset index before preprocessing as categorical preprocessing resets the index which in turn causes issues when merging it with the numerical & text features.
  • Fix: OneHotEncoder no longer uses a deprecated parameter.
  • Fix: Categorical features are now correctly processed as numpy arrays after transformation.
  • Misc bugfixes

Housekeeping:

  • Deps: Update requirements to resolve dependabot alerts.
  • Deps: Update setup.py to use latest versions of transformers, pandas, scikit-learn, scipy and accelerate.
  • Refactor: Rename the notebooks folder into an examples folder.
  • Refactor: Update all function calls to explicitly name parameters to avoid confusion.
  • Style: Reformat entire library with black.
  • Docs: Update repository maintainers
  • Docs: Add type hints & docstrings to data module.
  • Docs: Update Sphinx and regenerate documentation.
  • Chore: Update library to version 0.4.0.

Multimodal Toolkit v0.3.1

14 Nov 17:00
b4cfaa3
Compare
Choose a tag to compare
  • Longformer support (Thanks @jtfields!)
  • Documentation update
  • Dependencies updated; added accelerate to dependencies
  • Revert default trainer arguments to default huggingface arguments in scenarios where we don't need to define them
  • Update example notebooks; add sample inference code
  • We're at 0.3.1 since I made a mistake when uploading files to PyPi.
  • Resolve #33, #34, #46, #51, #53, #54

Realign To Transformers 4.26.1

10 Mar 15:42
c341715
Compare
Choose a tag to compare
  • Updated code to realign with transformers 4.26.1
  • Add tests to ensure each model in the library works.
  • Fix issues in running XLM models.
  • Consolidate versioning to a single source of truth
  • Add debug_dataset_size as an additional training argument.
  • Update setup.py with new maintainer info and specific versioning for libraries.
  • Supports Python 3.7+
  • Resolves #3, #7, #9, #14, #19, #27, #28, #29, #32

First Alpha Release

08 Sep 19:22
Compare
Choose a tag to compare
v_0.1.4-alpha

add more transformers to toolkit

First Alpha Release

22 Oct 19:30
Compare
Choose a tag to compare
First Alpha Release Pre-release
Pre-release
v_0.1.3-alpha

change BertLayerNorm to torch.nn.LayerNorm