embodichain.agents.rl.buffer

embodichain.agents.rl.buffer#

Overview#

The buffer package provides rollout and replay buffer structures used by RL algorithms.

Submodules

`standard_buffer`
`utils`

Rollout Buffer Classes#

Classes:

RolloutBuffer

Single-rollout buffer backed by a preallocated TensorDict.

class embodichain.agents.rl.buffer.standard_buffer.RolloutBuffer[source]#

Bases: object

Single-rollout buffer backed by a preallocated TensorDict.

The shared rollout uses a uniform [num_envs, time + 1] layout. For transition-only fields such as action, reward, and done, the final time index is reused as padding so the collector, environment, and algorithms can share a single TensorDict batch shape.

Methods:

`__init__`(num_envs, rollout_len, obs_dim, ...)
`add`(rollout)	Mark the shared rollout as ready for consumption.
`get`([flatten])	Return the stored rollout and clear the buffer.
`is_full`()	Return whether a rollout is waiting to be consumed.
`start_rollout`()	Return the shared rollout TensorDict for collector write-in.

Attributes:

buffer

__init__(num_envs, rollout_len, obs_dim, action_dim, device)[source]#

add(rollout)[source]#

Mark the shared rollout as ready for consumption.

Return type:: None

property buffer: TensorDict#

get(flatten=True)[source]#

Return the stored rollout and clear the buffer.

When flatten is True, the rollout is first converted to a transition view that drops the padded final slot from transition-only fields.

Return type:: TensorDict

is_full()[source]#

Return whether a rollout is waiting to be consumed.

Return type:: bool

start_rollout()[source]#

Return the shared rollout TensorDict for collector write-in.

Return type:: TensorDict

Buffer Utilities#

Functions:

`iterate_minibatches`(rollout, batch_size, device)	Yield shuffled minibatches from a flattened rollout.
`transition_view`(rollout[, flatten])	Build a transition-aligned TensorDict from a rollout.

embodichain.agents.rl.buffer.utils.iterate_minibatches(rollout, batch_size, device)[source]#

Yield shuffled minibatches from a flattened rollout.

Return type:: Iterator[TensorDict]

embodichain.agents.rl.buffer.utils.transition_view(rollout, flatten=False)[source]#

Build a transition-aligned TensorDict from a rollout.

The shared rollout uses a uniform [num_envs, time + 1] layout. For transition-only fields such as action, reward, and done, the final slot is reserved as padding so that all rollout fields share the same batch shape. This helper drops that padded slot and exposes the valid transition slices as a TensorDict with batch shape [num_envs, time].

Parameters:

rollout (TensorDict) – Rollout TensorDict with root batch shape [num_envs, time + 1].
flatten (bool) – If True, return a flattened [num_envs * time] view.

Return type:

TensorDict

Returns:

TensorDict containing transition-aligned fields.

embodichain.agents.rl.buffer

Contents

embodichain.agents.rl.buffer#

Overview#

Rollout Buffer Classes#

Buffer Utilities#