embodichain.agents.rl.buffer#
Overview#
The buffer package provides rollout and replay buffer structures used by
RL algorithms.
Submodules
Rollout Buffer Classes#
Classes:
Single-rollout buffer backed by a preallocated TensorDict. |
- class embodichain.agents.rl.buffer.standard_buffer.RolloutBuffer[source]#
Bases:
objectSingle-rollout buffer backed by a preallocated TensorDict.
The shared rollout uses a uniform [num_envs, time + 1] layout. For transition-only fields such as action, reward, and done, the final time index is reused as padding so the collector, environment, and algorithms can share a single TensorDict batch shape.
Methods:
__init__(num_envs, rollout_len, obs_dim, ...)add(rollout)Mark the shared rollout as ready for consumption.
get([flatten])Return the stored rollout and clear the buffer.
is_full()Return whether a rollout is waiting to be consumed.
Return the shared rollout TensorDict for collector write-in.
Attributes:
- property buffer: TensorDict#
Buffer Utilities#
Functions:
|
Yield shuffled minibatches from a flattened rollout. |
|
Build a transition-aligned TensorDict from a rollout. |
- embodichain.agents.rl.buffer.utils.iterate_minibatches(rollout, batch_size, device)[source]#
Yield shuffled minibatches from a flattened rollout.
- Return type:
Iterator[TensorDict]
- embodichain.agents.rl.buffer.utils.transition_view(rollout, flatten=False)[source]#
Build a transition-aligned TensorDict from a rollout.
The shared rollout uses a uniform [num_envs, time + 1] layout. For transition-only fields such as action, reward, and done, the final slot is reserved as padding so that all rollout fields share the same batch shape. This helper drops that padded slot and exposes the valid transition slices as a TensorDict with batch shape [num_envs, time].
- Parameters:
rollout (
TensorDict) – Rollout TensorDict with root batch shape [num_envs, time + 1].flatten (
bool) – If True, return a flattened [num_envs * time] view.
- Return type:
TensorDict- Returns:
TensorDict containing transition-aligned fields.