Trainer#
This module implements the main RL training loop, logging management, and event-driven extension.
Main Classes and Structure#
Trainer#
RL training coordinator, responsible for the interaction between algorithm, environment, and policy.
Main responsibilities:
Manage training loop, evaluation, and model saving.
Event-driven extension (e.g., environment randomization, data logging, evaluation events).
Logging output (TensorBoard/WandB/console), tracking rewards, episode length, loss, etc.
Key fields:
policy: RL policy object.algorithm: RL algorithm object.env/eval_env: Training and evaluation environments.writer: TensorBoard logger.event_manager/eval_event_manager: Event managers.global_step,ret_window,len_window: Training statistics.
Main Methods#
train(total_timesteps): Main training loop, automatically collects data, updates policy, and logs._collect_rollout(): Collect one rollout, supports custom callback statistics._log_train(losses): Log training loss, reward, sampling speed, etc._eval_once(): Periodic evaluation, records evaluation metrics.save_checkpoint(): Save model parameters and training state.
Event Management#
Supports custom events (e.g., environment randomization, data logging) injected via EventManager.
Events can be executed by interval/step/trigger, enabling flexible extension.
Logging and Monitoring#
Supports TensorBoard and WandB logging, automatically records reward, episode length, loss, sampling speed, etc.
Console output for training progress and statistics.
Usage Example#
trainer = Trainer(policy, env, algorithm, num_steps, batch_size, writer, ...)
trainer.train(total_steps)
trainer.save_checkpoint()
Extension and Customization#
Custom event modules can be implemented for environment reset, data collection, evaluation, etc.
Supports multi-environment parallelism and distributed training.
Training process can be flexibly adjusted via config files.
Practical Tips#
It is recommended to perform periodic evaluation and model saving to prevent loss of progress during training.
The event mechanism can be used for automated experiments, data collection, and environment reset.
Logging and monitoring help analyze training progress and tune hyperparameters.