nuke replay buffer to memorial

jjshoots · Sep 10, 2024 · 3a5f96b · 3a5f96b
1 parent 29b3ecd
commit 3a5f96b
Show file tree

Hide file tree

Showing 15 changed files with 3 additions and 1,561 deletions.
diff --git a/.github/workflows/linux-test.yml b/.github/workflows/linux-test.yml
@@ -21,4 +21,4 @@ jobs:
           pip install -e .
           pip install torch
           pip install pytest
-          pytest -v -rw test/test_replay_buffer.py
+          # pytest -v -rw test/test_*.py
diff --git a/README.md b/README.md
@@ -240,57 +240,6 @@ The Neural Blocks module also has functions that can generate single modules, re
 
 <br>
 
-### `from wingman.replay_buffer import FlatReplayBuffer as ReplayBuffer`
-
-This is a replay buffer designed around Torch's Dataloader class for reinforcement learning projects.
-This allows easy bootstrapping of the Dataloader's excellent shuffling and pre-batching capabilities.
-In addition, all the data is stored as a numpy array in a contiguous block of memory, allowing very fast retrieval.
-ReplayBuffer also doesn't put any limits on tuple length per transition; some people prefer to store $\{S, A, R, S'\}$, some prefer to store $\{S, A, R, S', A'\}$ - ReplayBuffer doesn't care!
-The length of the tuple can be as long or as short as you want, as long as every tuple fed in is the same length and each element of the tuple is the same shape.
-There is no need to predefine the shape of the inputs that you want to put in the ReplayBuffer, it automatically infers the shape and computes memory usage upon the first tuple stored.
-The basic usage of the ReplayBuffer is as follows:
-
-```python
-import torch
-
-from wingman.replay_buffer import FlatReplayBuffer as ReplayBuffer
-
-# we define the replay buffer to be able to store 1000 tuples of information
-memory = ReplayBuffer(mem_size=1000)
-
-# get the first observation from the environment
-next_obs = env.reset()
-
-# iterate until the environment is complete
-while env.done is False:
-    # rollover the observation
-    obs = next_obs
-
-    # get an action from the policy
-    act = policy(obs)
-
-    # sample a new transition
-    next_obs, rew, done, next_lbl = env.step(act)
-
-    # store stuff in the replay buffer
-    memory.push((obs, act, rew, next_obs, done))
-
-# perform training using the buffer
-dataloader = torch.utils.data.DataLoader(
-    memory, batch_size=32, shuffle=True, drop_last=False
-)
-
-# easily treat the replay buffer as an iterable that we can iterate through
-for batch_num, stuff in enumerate(dataloader):
-    observations = gpuize(stuff[0], "cuda:0")
-    actions = gpuize(stuff[1], "cuda:0")
-    rewards = gpuize(stuff[2], "cuda:0")
-    next_states = gpuize(stuff[3], "cuda:0")
-    dones = gpuize(stuff[4], "cuda:0")
-```
-
-
-<br>
 
 ### `from wingman import gpuize, cpuize`
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "jj_wingman"
-version = "0.19.3"
+version = "0.20.0"
 authors = [
   { name="Jet", email="taijunjet@hotmail.com" },
 ]

diff --git a/test/test_replay_buffer.py b/test/test_replay_buffer.py
diff --git a/test/test_replay_buffer_utils.py b/test/test_replay_buffer_utils.py