embodichain.agents.rl.models#

Overview#

Policy-network registration and model construction APIs for RL agents.

Functions

build_mlp_from_cfg(module_cfg, in_dim, out_dim)

Construct an MLP module from a minimal json-like config.

build_policy(policy_block, obs_space, ...[, ...])

Build a policy from config using spaces for extensibility.

get_policy_class(name)

get_registered_policy_names()

register_policy(name, policy_cls)

Classes:

ActorCritic

Actor-Critic with learnable log_std for Gaussian policy.

ActorOnly

Actor-only policy for algorithms that do not use a value function (e.g., GRPO).

MLP

General MLP supporting custom last activation, orthogonal init, and output reshape.

Policy

Abstract base class that all RL policies must implement.

Functions:

build_mlp_from_cfg(module_cfg, in_dim, out_dim)

Construct an MLP module from a minimal json-like config.

build_policy(policy_block, obs_space, ...[, ...])

Build a policy from config using spaces for extensibility.

get_policy_class(name)

get_registered_policy_names()

register_policy(name, policy_cls)

class embodichain.agents.rl.models.ActorCritic[source]#

Bases: Policy

Actor-Critic with learnable log_std for Gaussian policy.

This is a placeholder implementation of the Policy interface that: - Encapsulates MLP networks (actor + critic) that need to be trained by RL algorithms - Handles internal computation: MLP output → mean + learnable log_std → Normal distribution - Provides a uniform interface for RL algorithms (PPO, SAC, etc.)

This allows seamless swapping with other policy implementations (e.g., VLAPolicy) without modifying RL algorithm code.

Implements TensorDict-native interfaces while preserving get_action() compatibility for evaluation and legacy call-sites.

Methods:

__init__(obs_dim, action_dim, device, actor, ...)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

evaluate_actions(tensordict)

Evaluate actions and return current policy outputs.

forward(tensordict[, deterministic])

Write sampled actions and value estimates into the TensorDict.

get_value(tensordict)

Write value estimate for the given observations into the TensorDict.

__init__(obs_dim, action_dim, device, actor, critic)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

evaluate_actions(tensordict)[source]#

Evaluate actions and return current policy outputs.

Parameters:

tensordict (TensorDict) – TensorDict containing obs and action.

Return type:

TensorDict

Returns:

A new TensorDict containing sample_log_prob, entropy, and value.

forward(tensordict, deterministic=False)[source]#

Write sampled actions and value estimates into the TensorDict.

Return type:

TensorDict

get_value(tensordict)[source]#

Write value estimate for the given observations into the TensorDict.

Parameters:

tensordict (TensorDict) – Input TensorDict containing obs.

Return type:

TensorDict

Returns:

TensorDict with value populated.

class embodichain.agents.rl.models.ActorOnly[source]#

Bases: Policy

Actor-only policy for algorithms that do not use a value function (e.g., GRPO).

Same interface as ActorCritic: get_action and evaluate_actions return (action, log_prob, value), but value is always zeros since no critic is used.

Methods:

__init__(obs_dim, action_dim, device, actor)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

evaluate_actions(tensordict)

Evaluate actions and return current policy outputs.

forward(tensordict[, deterministic])

Write sampled actions and value estimates into the TensorDict.

get_value(tensordict)

Write value estimate for the given observations into the TensorDict.

__init__(obs_dim, action_dim, device, actor)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

evaluate_actions(tensordict)[source]#

Evaluate actions and return current policy outputs.

Parameters:

tensordict (TensorDict) – TensorDict containing obs and action.

Return type:

TensorDict

Returns:

A new TensorDict containing sample_log_prob, entropy, and value.

forward(tensordict, deterministic=False)[source]#

Write sampled actions and value estimates into the TensorDict.

Return type:

TensorDict

get_value(tensordict)[source]#

Write value estimate for the given observations into the TensorDict.

Parameters:

tensordict (TensorDict) – Input TensorDict containing obs.

Return type:

TensorDict

Returns:

TensorDict with value populated.

class embodichain.agents.rl.models.MLP[source]#

Bases: Sequential

General MLP supporting custom last activation, orthogonal init, and output reshape.

Parameters:
  • input_dim (-) – input dimension

  • output_dim (-) – output dimension (int or shape tuple/list)

  • hidden_dims (-) – hidden layer sizes, e.g. [256, 256]

  • activation (-) – hidden layer activation name (relu/elu/tanh/gelu/silu)

  • last_activation (-) – last-layer activation name or None for linear

  • use_layernorm (-) – whether to add LayerNorm after each hidden linear layer

  • dropout_p (-) – dropout probability for hidden layers (0 disables)

Methods:

__init__(input_dim, output_dim, hidden_dims)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

init_orthogonal([scales])

Orthogonal-initialize linear layers and zero the bias.

__init__(input_dim, output_dim, hidden_dims, activation='elu', last_activation=None, use_layernorm=False, dropout_p=0.0)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

init_orthogonal(scales=1.0)[source]#

Orthogonal-initialize linear layers and zero the bias.

scales: single gain value or a sequence with length equal to the number of linear layers.

Return type:

None

class embodichain.agents.rl.models.Policy[source]#

Bases: Module, ABC

Abstract base class that all RL policies must implement.

A Policy: - Encapsulates neural networks that are trained by RL algorithms - Handles internal computations (e.g., network output → distribution) - Provides a uniform interface for algorithms (PPO, SAC, etc.)

Methods:

__init__()

Initialize internal Module state, shared by both nn.Module and ScriptModule.

evaluate_actions(tensordict)

Evaluate actions and return current policy outputs.

forward(tensordict[, deterministic])

Write sampled actions and value estimates into the TensorDict.

get_action(tensordict[, deterministic])

Sample actions into the provided TensorDict without gradients.

get_value(tensordict)

Write value estimate for the given observations into the TensorDict.

Attributes:

device

Device where the policy parameters are located.

training

__init__()[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

device: torch.device#

Device where the policy parameters are located.

abstract evaluate_actions(tensordict)[source]#

Evaluate actions and return current policy outputs.

Parameters:

tensordict (TensorDict) – TensorDict containing obs and action.

Return type:

TensorDict

Returns:

A new TensorDict containing sample_log_prob, entropy, and value.

abstract forward(tensordict, deterministic=False)[source]#

Write sampled actions and value estimates into the TensorDict.

Return type:

TensorDict

get_action(tensordict, deterministic=False)[source]#

Sample actions into the provided TensorDict without gradients.

Parameters:
  • tensordict (TensorDict) – Input TensorDict containing obs.

  • deterministic (bool) – If True, return the mean action; otherwise sample

Return type:

TensorDict

Returns:

TensorDict with action, sample_log_prob, and value populated.

abstract get_value(tensordict)[source]#

Write value estimate for the given observations into the TensorDict.

Parameters:

tensordict (TensorDict) – Input TensorDict containing obs.

Return type:

TensorDict

Returns:

TensorDict with value populated.

training: bool#
embodichain.agents.rl.models.build_mlp_from_cfg(module_cfg, in_dim, out_dim)[source]#

Construct an MLP module from a minimal json-like config.

Return type:

MLP

Expected schema:
module_cfg = {

“type”: “mlp”, “hidden_sizes”: [256, 256], “activation”: “relu”,

}

embodichain.agents.rl.models.build_policy(policy_block, obs_space, action_space, device, actor=None, critic=None)[source]#

Build a policy from config using spaces for extensibility.

Built-in MLP policies still resolve flattened obs_dim / action_dim, while custom policies may accept richer obs_space / action_space inputs.

Return type:

Policy

embodichain.agents.rl.models.get_policy_class(name)[source]#
Return type:

Optional[Type[Policy]]

embodichain.agents.rl.models.get_registered_policy_names()[source]#
Return type:

list[str]

embodichain.agents.rl.models.register_policy(name, policy_cls)[source]#
Return type:

None