Skip to content

SB3-Contrib v1.7.0 : Bug fixes for PPO LSTM and quality of life improvements

Compare
Choose a tag to compare
@araffin araffin released this 10 Jan 21:41
· 39 commits to master since this release
7bf9cf3

Warning
Shared layers in MLP policy (mlp_extractor) are now deprecated for PPO, A2C and TRPO.
This feature will be removed in SB3 v1.8.0 and the behavior of net_arch=[64, 64]
will create separate networks with the same architecture, to be consistent with the off-policy algorithms.

Note
TRPO models saved with SB3 < 1.7.0 will show a warning about
missing keys in the state dict when loaded with SB3 >= 1.7.0.
To suppress the warning, simply save the model again.
You can find more info in issue # 1233

Breaking Changes:

  • Removed deprecated create_eval_env, eval_env, eval_log_path, n_eval_episodes and eval_freq parameters,
    please use an EvalCallback instead
  • Removed deprecated sde_net_arch parameter
  • Upgraded to Stable-Baselines3 >= 1.7.0

New Features:

  • Introduced mypy type checking
  • Added support for Python 3.10
  • Added with_bias parameter to ARSPolicy
  • Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
  • Features extractors now properly support unnormalized image-like observations (3D tensor)
    when passing normalize_images=False

Bug Fixes:

  • Fixed a bug in RecurrentPPO where the lstm states where incorrectly reshaped for n_lstm_layers > 1 (thanks @kolbytn)
  • Fixed RuntimeError: rnn: hx is not contiguous while predicting terminal values for RecurrentPPO when n_lstm_layers > 1

Deprecations:

  • You should now explicitely pass a features_extractor parameter when calling extract_features()
  • Deprecated shared layers in MlpExtractor (@AlexPasqua)

Others:

  • Fixed flake8 config
  • Fixed sb3_contrib/common/utils.py type hint
  • Fixed sb3_contrib/common/recurrent/type_aliases.py type hint
  • Fixed sb3_contrib/ars/policies.py type hint
  • Exposed modules in __init__.py with __all__ attribute (@ZikangXiong)
  • Removed ignores on Flake8 F401 (@ZikangXiong)
  • Upgraded GitHub CI/setup-python to v4 and checkout to v3
  • Set tensors construction directly on the device
  • Standardized the use of from gym import spaces