How to use the Logger

Here we explain in detail the usage of the MushroomRL Logger class. This class can be used as a standardized console logger and can also log on disk Numpy arrays or a mushroom agent, using the appropriate logging folder.

Constructing the Logger

To initialize the logger we can simply choose a log directory and an experiment name:

from mushroom_rl.core import Logger

# Create a logger object, creating a log folder
logger = Logger('tutorial', results_dir='/tmp/logs',
                log_console=True)

This will create the experiment folder named ‘tutorial’ inside the base folder ‘/tmp/logs’. The logger creates all the necessary directories if they do not exist. If results_dir is not specified, the log will create a ‘./logs’ base directory. By setting log_console to true, the logger will store the console output in a ‘.log’ text file inside the experiment folder, with the same name. If the file already exists, the logger will append the new logged lines.

If you do not want the logger to create any directory e.g., to only use the log for the console output, you can force the results_dir parameter to None:

# Create a logger object, without creating the log folder
logger_no_folder = Logger('tutorial_no_folder', results_dir=None)

Logging message on the console

The most basic functionality of the Logger is to output text messages on the standard output. Our logger uses the standard Python logger, and it follows a similar set of functionalities:

# Write a line of hashtags, to be used as a separator
logger.strong_line()

# Print a debug message (filtered from the console by default)
logger.debug('This is a debug message')

# Print an info message
logger.info('This is an info message')

# Print a warning
logger.warning('This is a warning message')

# Print an error
logger.error('This is an error message')

# Print a critical error message
logger.critical('This is a critical error')

# Print a line of dashes, to be used as a (weak) separator
logger.weak_line()

By default, the console only shows messages with the info level or higher, while the .log file (if console logging is active) stores everything down to the debug level. You can change these thresholds through the console_log_level and file_log_level arguments of the Logger constructor, using the standard Python logging levels:

import logging
logger = Logger('tutorial', results_dir='/tmp/logs', log_console=True,
                console_log_level=logging.DEBUG, file_log_level=logging.DEBUG)

We can also log to terminal the exceptions. Using this method, instead of a raw print, you can manage correctly the exception output without breaking any tqdm progress bar (see below), and the exception text will be saved in the console log files (if console logging is active).

# Exception logging
try:
    raise RuntimeError('A runtime exception occurred')
except RuntimeError as e:
    logger.error('Exception catched, here\'s the stack trace:')
    logger.exception(e)

logger.weak_line()

Logging a Reinforcement Learning experiment

Our Logger includes some functionalities to log RL experiment data easily. To demonstrate this, we will set up a simple RL experiment, using Q-Learning in the simple chain enviornment.

# Logging learning process
from mushroom_rl.core import Core
from mushroom_rl.environments.generators import generate_simple_chain
from mushroom_rl.policy import EpsGreedy
from mushroom_rl.algorithms.value import QLearning
from mushroom_rl.rl_utils.parameters import Parameter
from tqdm import trange
from time import sleep
import numpy as np


# Setup simple learning environment
mdp = generate_simple_chain(state_n=5, goal_states=[2], prob=.8, rew=1, gamma=.9)
epsilon = Parameter(value=.15)
pi = EpsGreedy(epsilon=epsilon)
agent = QLearning(mdp.info, pi, learning_rate=Parameter(value=.2))
core = Core(agent, mdp)
epochs = 10

We skip the details of this RL experiment, as they are not relevant to the current tutorial. You can have a deeper look at RL experiments with MushroomRL in other tutorials.

It is important to notice that we use tqdm progress bar, as our logger is integrated with this package, and can print log messages while the progress bar is showing progress, without disrupting the progress bar and the terminal.

We first print the learning performances before the learning, using the epoch_info method:

logger.info('Experiment started')
logger.strong_line()

dataset = core.evaluate(n_steps=100)
J = np.mean(dataset.discounted_return)
R = np.mean(dataset.undiscounted_return)  # Undiscounted returns

logger.epoch_info(0, J=J, R=R, any_label='any value')

Notice that this method can print any possible label passed as a function parameter, so it’s not restricted to J, R, or other predefined metrics.

We now consider the learning loop:

for i in trange(epochs):
    # Here some learning
    core.learn(n_steps=100, n_steps_per_fit=1)
    sleep(0.5)
    dataset = core.evaluate(n_steps=100)
    sleep(0.5)
    J = np.mean(dataset.discounted_return)  # Discounted returns
    R = np.mean(dataset.undiscounted_return)  # Undiscounted returns

    # Here logging epoch results to the console
    logger.epoch_info(i+1, J=J, R=R)

    # Logging the data in J.npy and R.npy
    logger.log_numpy(J=J, R=R)

    # Logging the best agent according to the best J
    logger.log_best_agent(agent, J)

Here we make use of both the epoch_info method to log the data in the console output and the methods log_numpy and log_best_agent to log the learning progress.

The log_numpy method can take an arbitrary value (primitive or a NumPy array) and log into a single NumPy array (or matrix). Again a set of arbitrary keywords can be used to save data into different filenames. If the seed parameter of the constructor of the Logger class is specified, the filename will include a postfix with the seed. This is useful when multiple runs of the same experiment are executed.

The log_best_agent saves the current agent, into the ‘agent-best.msh’ file. However, the current agent will be stored on disk only if it improves w.r.t. the previously logged one.

We conclude the learning experiment by logging the final agent and the last dataset:

logger.log_agent(agent)

# Log the last dataset
logger.log_dataset(dataset)

logger.info('Experiment terminated')

Advanced Logger topics

The logger can be also used to continue the learning from a previously existing run, without overwriting the stored results values. This can be done by specifying the append flag in the logger’s constructor.

del logger  # Delete previous logger
new_logger = Logger('tutorial', results_dir='/tmp/logs',
                    log_console=True, append=True)

# add infinite at the end of J.npy
new_logger.log_numpy(J=np.inf)
new_logger.info('Tutorial ends here')

Finally, another functionality of the logger is to activate some specific output from some algorithms. This can be done by calling the set_logger method on the Core object, which forwards the logger to the agent and automatically configures the video recording fps from the environment:

core.set_logger(logger)

Alternatively, the logger can be passed directly as a constructor argument:

core = Core(agent, mdp, logger=logger)

Algorithms use the logger to describe some learning metrics after every fit, both as console output and, if enabled, as Weights & Biases logging, described next.

Logging to Weights & Biases

The Logger can optionally log to Weights & Biases (wandb), in addition to the console and the numpy disk logging described above.

wandb logging is an optional functionality: it is enabled only if the wandb package is installed and a set of init arguments is provided to the Logger. If wandb is not installed, or no init arguments are provided, every wandb logging call is a safe no-op. You can install the optional dependency with:

pip install mushroom_rl[wandb]

To enable wandb logging, we build a dictionary of arguments for wandb.init and pass it to the Logger through the wandb_kwargs argument. The helper static method default_wandb_kwargs returns an editable dictionary with sensible defaults; the config argument should contain the experiment hyperparameters:

from mushroom_rl.core import Logger

# Build the wandb init arguments, including the experiment hyperparameters
hyperparams = dict(gamma=0.99, lr=3e-4, batch_size=64)
wandb_kwargs = Logger.default_wandb_kwargs('tutorial_project',
                                           config=hyperparams,
                                           name='tutorial_run',
                                           mode='offline')

# Create a logger with wandb logging enabled
logger = Logger('wandb_tutorial', results_dir='/tmp/logs', use_timestamp=True,
                wandb_kwargs=wandb_kwargs, force_numpy=False)

The log_training method logs the training metrics: they are grouped under the training/ prefix in wandb (using the number of fits as x-axis), printed on the console with the debug level (so they are not shown by default), and stored on disk as numpy arrays inside a training subfolder only if the Logger was constructed with force_numpy=True. The metric values are passed as keyword arguments, and the optional first positional argument is a group prefix prepended to every name (so log_training('critic', loss=...) logs training/critic/loss); the resulting '/' separator groups the metric in wandb and is replaced by '_' in the numpy file name:

# Log training metrics: the optional first argument is a group prefix, so the
# values below are logged as training/actor/loss and training/critic/loss
logger.log_training('actor', loss=0.5)
logger.log_training('critic', loss=1.2)

The number of fits counter, used as x-axis for the training metrics, is advanced explicitly by calling advance_step once per fit, so that all the values logged during a fit share the same x-axis value:

# Advance the number of fits counter (x-axis) once per fit
logger.advance_step()
logger.log_training('actor', loss=0.4)
logger.log_training('critic', loss=1.0)

The log_evaluation method logs the evaluation metrics: they are grouped under the eval/ prefix in wandb (using the epoch as x-axis), printed on the console through epoch_info, and stored on disk as numpy arrays in the logging directory:

# Log evaluation metrics, grouped under eval/ and using the epoch as x-axis
logger.log_evaluation(0, J=10.0, R=20.0)

When the Logger is created, it automatically sets the wandb group to the experiment name (log_name) unless a group is already specified in wandb_kwargs. This means that all runs from the same experiment (e.g. different seeds) are grouped together in the wandb dashboard. You can override this by passing an explicit group in wandb_kwargs:

wandb_kwargs = Logger.default_wandb_kwargs('my_project', group='custom_group')

When a seed is passed to the Logger, it is automatically added to the wandb config dictionary and, if name is not already set, the run name is set to log_name_seed (e.g. SAC_42). This makes it easy to distinguish individual seed runs within the same group:

The wandb run is finished automatically when the process exits, so there is usually no need to close it explicitly; the finish method is available to close it earlier if needed.

While the numpy evaluation logging is typically driven by the experiment script, the training logging is meant to be driven from inside the algorithms, which log their internal metrics (e.g. losses, entropy, KL divergence) during the fit, once the logger is attached to the agent through the Core (via set_logger or the constructor) as shown above. A complete runnable example with metric logging is available in examples/wandb_logging.py.

When the logger is attached, it is automatically forwarded down the agent’s loggable components, so that their relevant quantities are logged under a hierarchy of grouped metric names without any extra code: the critic approximator logs critic/loss, an exploration parameter logs policy/epsilon, a learning rate logs alpha/value, a distribution logs distribution/entropy, and so on. This forwarding is part of the MushroomObject interface (see the related tutorial): any class declares its loggable children with self._add_logger_attr (analogous to self._add_save_attr), and set_logger propagates the logger and the metric-name prefix recursively.

Video Recording

The Logger includes a VideoLogger mixin that handles video recording. Videos are saved in a videos/ subfolder of the logging directory. The recorder is created lazily on the first frame, so no resources are allocated until recording actually starts.

To record during evaluation or learning, pass record=True (and render=True) to the Core methods. The Core delegates recording to the agent’s logger:

logger = Logger('my_experiment', results_dir='./logs')
core = Core(agent, mdp, logger=logger)

# Record a video during evaluation
core.evaluate(n_episodes=1, render=True, record=True)

The fps is automatically set from the environment when the logger is attached to the Core via set_logger or the constructor. It can also be set explicitly in the Logger constructor:

logger = Logger('my_experiment', results_dir='./logs', fps=30)

By default, the VideoRecorder class from mushroom_rl.utils.record is used, which writes .mp4 files using OpenCV with the VP9 codec. The codec can be changed through the recorder_kwargs argument (e.g. recorder_kwargs=dict(codec='avc1') for H.264). A custom recorder class can be provided through the recorder_class argument. The class must implement __call__(frame) and stop() methods:

logger = Logger('my_experiment', results_dir='./logs',
                recorder_class=MyCustomRecorder)

The underlying recorder instance is accessible through logger.video_recorder after the first frame has been recorded, and the list of recorded (and, with append=True, previously stored) video files is available through logger.recorded_videos.

If wandb logging is active, the last recorded video can be uploaded to wandb through the log_video method. The recording is stopped by the Core, so log_video only handles the upload, using the epoch as x-axis. The video is uploaded under the video/ group with a fixed key (evaluation by default), so that wandb shows a slider to browse videos across epochs:

core.evaluate(n_episodes=1, render=True, record=True)
logger.log_video(epoch)

A specific video file can also be uploaded instead of the last recorded one by passing its path through the video argument:

logger.log_video(epoch, video='./logs/my_experiment/videos/recording.mp4')