Config#

This module defines configuration classes for RL algorithms, centralizing the management of training hyperparameters and supporting automatic loading and experiment reproducibility.

Main Classes and Structure#

AlgorithmCfg#

  • Base parameter config class for RL algorithms, supports dataclass-based automation.

  • Typical fields:

    • device: Training device (e.g., “cuda”, “cpu”).

    • learning_rate: Learning rate.

    • batch_size: Batch size per training epoch.

    • gamma: Discount factor.

    • gae_lambda: GAE advantage estimation parameter.

    • max_grad_norm: Gradient clipping threshold.

  • Supports inheritance and extension (e.g., PPOCfg adds clip_coef, ent_coef, vf_coef; GRPOCfg adds group_size, kl_coef, truncate_at_first_done).

Automatic Loading#

  • Supports automatic parsing of JSON config files; the main training script injects parameters automatically.

  • Decouples config from code, making batch experiments and parameter tuning easier.

Usage Example#

from embodichain.agents.rl.utils import AlgorithmCfg
cfg = AlgorithmCfg(learning_rate=1e-4, batch_size=8192, gamma=0.99)

Or via config file:

{
    "algorithm": {
        "name": "ppo",
        "cfg": {
            "learning_rate": 0.0001,
            "batch_size": 8192,
            "gamma": 0.99,
            "gae_lambda": 0.95,
            "clip_coef": 0.2,
            "ent_coef": 0.01,
            "vf_coef": 0.5,
            "max_grad_norm": 0.5
        }
    }
}

GRPO example (for Embodied AI / from-scratch training):

{
    "algorithm": {
        "name": "grpo",
        "cfg": {
            "learning_rate": 0.0001,
            "n_epochs": 10,
            "batch_size": 8192,
            "gamma": 0.99,
            "clip_coef": 0.2,
            "ent_coef": 0.001,
            "kl_coef": 0,
            "group_size": 4,
            "eps": 1e-8,
            "reset_every_rollout": true,
            "max_grad_norm": 0.5,
            "truncate_at_first_done": true
        }
    }
}
  • kl_coef: Set to 0 for from-scratch training (CartPole, dense reward); use 0.02 for VLA/LLM fine-tuning.

  • group_size: Number of envs per group for within-group return normalization (must divide num_envs).

Extension and Customization#

  • Custom algorithm parameter classes are supported for multi-algorithm and multi-task experiments.

  • Config classes are seamlessly integrated with the main training script for automated experiments and reproducibility.

  • Supports parameter validation, default values, and type hints.

Practical Tips#

  • It is recommended to manage all experiment parameters via JSON config files for reproducibility and tuning.

  • Supports multi-algorithm config for easy comparison and automation.