embodichain.agents.rl.models

embodichain.agents.rl.models#

Overview#

Policy-network registration and model construction APIs for RL agents.

Functions

build_mlp_from_cfg(module_cfg, in_dim, out_dim)

Construct an MLP module from a minimal json-like config.

build_policy(policy_block, obs_space, ...[, ...])

Build a policy from config using spaces for extensibility.

get_policy_class(name)

get_registered_policy_names()

register_policy(name, policy_cls)

Classes:

`ActorCritic`	Actor-Critic with learnable log_std for Gaussian policy.
`ActorOnly`	Actor-only policy for algorithms that do not use a value function (e.g., GRPO).
`MLP`	General MLP supporting custom last activation, orthogonal init, and output reshape.
`Policy`	Abstract base class that all RL policies must implement.

Functions:

`build_mlp_from_cfg`(module_cfg, in_dim, out_dim)	Construct an MLP module from a minimal json-like config.
`build_policy`(policy_block, obs_space, ...[, ...])	Build a policy from config using spaces for extensibility.
`get_policy_class`(name)
`get_registered_policy_names`()
`register_policy`(name, policy_cls)

class embodichain.agents.rl.models.ActorCritic[source]#

Bases: Policy

Actor-Critic with learnable log_std for Gaussian policy.

This is a placeholder implementation of the Policy interface that: - Encapsulates MLP networks (actor + critic) that need to be trained by RL algorithms - Handles internal computation: MLP output → mean + learnable log_std → Normal distribution - Provides a uniform interface for RL algorithms (PPO, SAC, etc.)

This allows seamless swapping with other policy implementations (e.g., VLAPolicy) without modifying RL algorithm code.

Implements TensorDict-native interfaces while preserving get_action() compatibility for evaluation and legacy call-sites.

Methods:

`__init__`(obs_dim, action_dim, device, actor, ...)	Initialize internal Module state, shared by both nn.Module and ScriptModule.
`evaluate_actions`(tensordict)	Evaluate actions and return current policy outputs.
`forward`(tensordict[, deterministic])	Write sampled actions and value estimates into the TensorDict.
`get_value`(tensordict)	Write value estimate for the given observations into the TensorDict.

__init__(obs_dim, action_dim, device, actor, critic)[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

evaluate_actions(tensordict)[source]#

Evaluate actions and return current policy outputs.

Parameters:: tensordict (TensorDict) – TensorDict containing obs and action.
Return type:: TensorDict
Returns:: A new TensorDict containing sample_log_prob, entropy, and value.

forward(tensordict, deterministic=False)[source]#

Write sampled actions and value estimates into the TensorDict.

Return type:: TensorDict

get_value(tensordict)[source]#

Write value estimate for the given observations into the TensorDict.

Parameters:: tensordict (TensorDict) – Input TensorDict containing obs.
Return type:: TensorDict
Returns:: TensorDict with value populated.

class embodichain.agents.rl.models.ActorOnly[source]#

Bases: Policy

Actor-only policy for algorithms that do not use a value function (e.g., GRPO).

Same interface as ActorCritic: get_action and evaluate_actions return (action, log_prob, value), but value is always zeros since no critic is used.

Methods:

`__init__`(obs_dim, action_dim, device, actor)	Initialize internal Module state, shared by both nn.Module and ScriptModule.
`evaluate_actions`(tensordict)	Evaluate actions and return current policy outputs.
`forward`(tensordict[, deterministic])	Write sampled actions and value estimates into the TensorDict.
`get_value`(tensordict)	Write value estimate for the given observations into the TensorDict.

__init__(obs_dim, action_dim, device, actor)[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

evaluate_actions(tensordict)[source]#

Evaluate actions and return current policy outputs.

Parameters:: tensordict (TensorDict) – TensorDict containing obs and action.
Return type:: TensorDict
Returns:: A new TensorDict containing sample_log_prob, entropy, and value.

forward(tensordict, deterministic=False)[source]#

Write sampled actions and value estimates into the TensorDict.

Return type:: TensorDict

get_value(tensordict)[source]#

Write value estimate for the given observations into the TensorDict.

Parameters:: tensordict (TensorDict) – Input TensorDict containing obs.
Return type:: TensorDict
Returns:: TensorDict with value populated.

class embodichain.agents.rl.models.MLP[source]#

Bases: Sequential

General MLP supporting custom last activation, orthogonal init, and output reshape.

Parameters:

input_dim (-) – input dimension
output_dim (-) – output dimension (int or shape tuple/list)
hidden_dims (-) – hidden layer sizes, e.g. [256, 256]
activation (-) – hidden layer activation name (relu/elu/tanh/gelu/silu)
last_activation (-) – last-layer activation name or None for linear
use_layernorm (-) – whether to add LayerNorm after each hidden linear layer
dropout_p (-) – dropout probability for hidden layers (0 disables)

Methods:

`__init__`(input_dim, output_dim, hidden_dims)	Initialize internal Module state, shared by both nn.Module and ScriptModule.
`init_orthogonal`([scales])	Orthogonal-initialize linear layers and zero the bias.

__init__(input_dim, output_dim, hidden_dims, activation='elu', last_activation=None, use_layernorm=False, dropout_p=0.0)[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

init_orthogonal(scales=1.0)[source]#

Orthogonal-initialize linear layers and zero the bias.

scales: single gain value or a sequence with length equal to the number of linear layers.

Return type:: None

class embodichain.agents.rl.models.Policy[source]#

Bases: Module, ABC

Abstract base class that all RL policies must implement.

A Policy: - Encapsulates neural networks that are trained by RL algorithms - Handles internal computations (e.g., network output → distribution) - Provides a uniform interface for algorithms (PPO, SAC, etc.)

Methods:

`__init__`()	Initialize internal Module state, shared by both nn.Module and ScriptModule.
`evaluate_actions`(tensordict)	Evaluate actions and return current policy outputs.
`forward`(tensordict[, deterministic])	Write sampled actions and value estimates into the TensorDict.
`get_action`(tensordict[, deterministic])	Sample actions into the provided TensorDict without gradients.
`get_value`(tensordict)	Write value estimate for the given observations into the TensorDict.

Attributes:

`device`	Device where the policy parameters are located.
`training`

__init__()[source]#: Initialize internal Module state, shared by both nn.Module and ScriptModule.

device: torch.device#: Device where the policy parameters are located.

abstract evaluate_actions(tensordict)[source]#

Evaluate actions and return current policy outputs.

Parameters:: tensordict (TensorDict) – TensorDict containing obs and action.
Return type:: TensorDict
Returns:: A new TensorDict containing sample_log_prob, entropy, and value.

abstract forward(tensordict, deterministic=False)[source]#

Write sampled actions and value estimates into the TensorDict.

Return type:: TensorDict

get_action(tensordict, deterministic=False)[source]#

Sample actions into the provided TensorDict without gradients.

Parameters:

tensordict (TensorDict) – Input TensorDict containing obs.
deterministic (bool) – If True, return the mean action; otherwise sample

Return type:

TensorDict

Returns:

TensorDict with action, sample_log_prob, and value populated.

abstract get_value(tensordict)[source]#

Write value estimate for the given observations into the TensorDict.

Parameters:: tensordict (TensorDict) – Input TensorDict containing obs.
Return type:: TensorDict
Returns:: TensorDict with value populated.

training: bool#

embodichain.agents.rl.models.build_mlp_from_cfg(module_cfg, in_dim, out_dim)[source]#

Construct an MLP module from a minimal json-like config.

Return type:: MLP

Expected schema:

module_cfg = {: “type”: “mlp”, “hidden_sizes”: [256, 256], “activation”: “relu”,

}

embodichain.agents.rl.models.build_policy(policy_block, obs_space, action_space, device, actor=None, critic=None)[source]#

Build a policy from config using spaces for extensibility.

Built-in MLP policies still resolve flattened obs_dim / action_dim, while custom policies may accept richer obs_space / action_space inputs.

Return type:: Policy

embodichain.agents.rl.models.get_policy_class(name)[source]#

Return type:: Optional[Type[Policy]]

embodichain.agents.rl.models.get_registered_policy_names()[source]#

Return type:: list[str]

embodichain.agents.rl.models.register_policy(name, policy_cls)[source]#

Return type:: None

embodichain.agents.rl.models

Contents

embodichain.agents.rl.models#

Overview#