Skip to content
Dipendra Misra edited this page Apr 21, 2023 · 2 revisions

Welcome to the Intrepid: Interactive Representation Discovery wiki!

This wiki contains detailed instructions on how to use this repository. We begin with a quick summary. If you have any questions then please raise an issue and tag it as [Question].

What is Intrepid?

Intrepid is a repository that contains a list of decision-making algorithms (which includes bandits and reinforcement learning as special cases). A decision-making algorithm helps a decision-making agent to well simply make decisions.

A core focus of Intrepid is on Decision Making which requires learning a latent state/representation of the world. E.g., consider an agent that is navigating in an image-based environment. The observation here is the image generated by its camera while a good latent state could be the position of the agent in the world, along with any dynamic obstacles.

Core components of Intrepid

Intrepid consists of the following components:

  1. Core learning algorithms. These are mostly located in ./src/learning/core_learner with algorithm-specific util functionality in src/learning/learning_utils. E.g., the Homer algorithm is implemented in ./src/learning/core_learner/homer.py. See the algorithm page for full list and description of these algorithms. The learning utils for example consist of a generic learner class in ./src/learning/learning_utils/generic_learner.py or routines that perform independence test.

  2. Useful Decision-Making Tools: This includes a variety of packages that are routinely sued across algorithms. This includes:

    • methods for generating episode (./src/learning/core_learner/policy_roll_in)
    • methods for policy search given either offline data or a set of exploration policies (./src/learning/core_learner/policy_search)
    • a variety of self-supervised learning objectives for learning latent states (./src/learning/core_learner/state_abstraction). This includes autoencoder, inverse dynamics, temporal contrastive learning, and multi-step inverse dynamics. For legacy reasons, at times the inverse dynamics is referred to as inverse kinematics in the code.
  3. We include a large list of models that includes various encoders, inverse dynamics models, and generic classifiers (./src/model).

    • A list of policies that map an observation (or history including time) can be found in ./src/model/policy
    • A list of encoders that map observation to a latent state representation (either discrete or continuous) can be found in ./src/model/encoders
    • A list of decoders that map latent state representation to observation can be found in ./src/model/decoders
    • A list of classifiers can be found in ./src/model/classifiers
    • A list of models for forward dynamics that map a given observation and action to the next observation can be found in ./src/model/forward_model
    • A list of models for inverse dynamics that map a given ordered pair of observations to the action that can take the agent from the former observation to the latter can be found in ./src/model/inverse_dynamics
  4. Set of environments and environment wrappers for popular existing domains (to be installed separately).

    • We include some challenging exploration problems with relatively simple observational space for quick proof-of-concept studies where the focus is not on realistic observational noise but on exploration and planning. (./src/environments/rl_acid_env)

    • We also include several grid world instances built on top of the Minigrid environment (./src/environments/minigrid). You will have to install minigrid using requirements file or on your own.

Clone this wiki locally