embodichain.agents.rl.models#
Overview#
Policy-network registration and model construction APIs for RL agents.
Functions
build_mlp_from_cfg(module_cfg, in_dim, out_dim)Construct an MLP module from a minimal json-like config.
build_policy(policy_block, obs_space, ...[, ...])Build a policy from config using spaces for extensibility.
get_policy_class(name)
register_policy(name, policy_cls)
Classes:
Actor-Critic with learnable log_std for Gaussian policy. |
|
Actor-only policy for algorithms that do not use a value function (e.g., GRPO). |
|
General MLP supporting custom last activation, orthogonal init, and output reshape. |
|
Abstract base class that all RL policies must implement. |
Functions:
|
Construct an MLP module from a minimal json-like config. |
|
Build a policy from config using spaces for extensibility. |
|
|
|
- class embodichain.agents.rl.models.ActorCritic[source]#
Bases:
PolicyActor-Critic with learnable log_std for Gaussian policy.
This is a placeholder implementation of the Policy interface that: - Encapsulates MLP networks (actor + critic) that need to be trained by RL algorithms - Handles internal computation: MLP output → mean + learnable log_std → Normal distribution - Provides a uniform interface for RL algorithms (PPO, SAC, etc.)
This allows seamless swapping with other policy implementations (e.g., VLAPolicy) without modifying RL algorithm code.
Implements TensorDict-native interfaces while preserving get_action() compatibility for evaluation and legacy call-sites.
Methods:
__init__(obs_dim, action_dim, device, actor, ...)Initialize internal Module state, shared by both nn.Module and ScriptModule.
evaluate_actions(tensordict)Evaluate actions and return current policy outputs.
forward(tensordict[, deterministic])Write sampled actions and value estimates into the TensorDict.
get_value(tensordict)Write value estimate for the given observations into the TensorDict.
- __init__(obs_dim, action_dim, device, actor, critic)[source]#
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- evaluate_actions(tensordict)[source]#
Evaluate actions and return current policy outputs.
- Parameters:
tensordict (
TensorDict) – TensorDict containing obs and action.- Return type:
TensorDict- Returns:
A new TensorDict containing sample_log_prob, entropy, and value.
- class embodichain.agents.rl.models.ActorOnly[source]#
Bases:
PolicyActor-only policy for algorithms that do not use a value function (e.g., GRPO).
Same interface as ActorCritic: get_action and evaluate_actions return (action, log_prob, value), but value is always zeros since no critic is used.
Methods:
__init__(obs_dim, action_dim, device, actor)Initialize internal Module state, shared by both nn.Module and ScriptModule.
evaluate_actions(tensordict)Evaluate actions and return current policy outputs.
forward(tensordict[, deterministic])Write sampled actions and value estimates into the TensorDict.
get_value(tensordict)Write value estimate for the given observations into the TensorDict.
- __init__(obs_dim, action_dim, device, actor)[source]#
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- evaluate_actions(tensordict)[source]#
Evaluate actions and return current policy outputs.
- Parameters:
tensordict (
TensorDict) – TensorDict containing obs and action.- Return type:
TensorDict- Returns:
A new TensorDict containing sample_log_prob, entropy, and value.
- class embodichain.agents.rl.models.MLP[source]#
Bases:
SequentialGeneral MLP supporting custom last activation, orthogonal init, and output reshape.
- Parameters:
input_dim (-) – input dimension
output_dim (-) – output dimension (int or shape tuple/list)
hidden_dims (-) – hidden layer sizes, e.g. [256, 256]
activation (-) – hidden layer activation name (relu/elu/tanh/gelu/silu)
last_activation (-) – last-layer activation name or None for linear
use_layernorm (-) – whether to add LayerNorm after each hidden linear layer
dropout_p (-) – dropout probability for hidden layers (0 disables)
Methods:
__init__(input_dim, output_dim, hidden_dims)Initialize internal Module state, shared by both nn.Module and ScriptModule.
init_orthogonal([scales])Orthogonal-initialize linear layers and zero the bias.
- class embodichain.agents.rl.models.Policy[source]#
Bases:
Module,ABCAbstract base class that all RL policies must implement.
A Policy: - Encapsulates neural networks that are trained by RL algorithms - Handles internal computations (e.g., network output → distribution) - Provides a uniform interface for algorithms (PPO, SAC, etc.)
Methods:
__init__()Initialize internal Module state, shared by both nn.Module and ScriptModule.
evaluate_actions(tensordict)Evaluate actions and return current policy outputs.
forward(tensordict[, deterministic])Write sampled actions and value estimates into the TensorDict.
get_action(tensordict[, deterministic])Sample actions into the provided TensorDict without gradients.
get_value(tensordict)Write value estimate for the given observations into the TensorDict.
Attributes:
- device: torch.device#
Device where the policy parameters are located.
- abstract evaluate_actions(tensordict)[source]#
Evaluate actions and return current policy outputs.
- Parameters:
tensordict (
TensorDict) – TensorDict containing obs and action.- Return type:
TensorDict- Returns:
A new TensorDict containing sample_log_prob, entropy, and value.
- abstract forward(tensordict, deterministic=False)[source]#
Write sampled actions and value estimates into the TensorDict.
- Return type:
TensorDict
- get_action(tensordict, deterministic=False)[source]#
Sample actions into the provided TensorDict without gradients.
- Parameters:
tensordict (
TensorDict) – Input TensorDict containing obs.deterministic (
bool) – If True, return the mean action; otherwise sample
- Return type:
TensorDict- Returns:
TensorDict with action, sample_log_prob, and value populated.
- abstract get_value(tensordict)[source]#
Write value estimate for the given observations into the TensorDict.
- Parameters:
tensordict (
TensorDict) – Input TensorDict containing obs.- Return type:
TensorDict- Returns:
TensorDict with value populated.
- training: bool#
- embodichain.agents.rl.models.build_mlp_from_cfg(module_cfg, in_dim, out_dim)[source]#
Construct an MLP module from a minimal json-like config.
- Return type:
- Expected schema:
- module_cfg = {
“type”: “mlp”, “hidden_sizes”: [256, 256], “activation”: “relu”,
}
- embodichain.agents.rl.models.build_policy(policy_block, obs_space, action_space, device, actor=None, critic=None)[source]#
Build a policy from config using spaces for extensibility.
Built-in MLP policies still resolve flattened obs_dim / action_dim, while custom policies may accept richer obs_space / action_space inputs.
- Return type: