skeltorch.Runner

class skeltorch.Runner

Skeltorch runner class.

The runner object stores the logic associated with both default and user-implemented pipelines. It is in charge of handling the flow of the data since it leaves the loader until the final result is obtained, where the final result depends on which data pipeline is executed.

You are required to extend this class and implement its abstract methods. Check out examples to find real implementations of skeltorch.Runner classes.

experiment

Experiment object.

Type:skeltorch.Experiment
logger

Logger object.

Type:logging.Logger
model

Model object.

Type:torch.nn.Module
optimizer

Optimizer object.

Type:torch.optim.Optimizer
counters

Counters of the training iterations, validation iterations and epochs.

Type:dict
losses_it

Iteration losses of both training and validation splits.

Type:dict
losses_epoch

Epoch losses of both training and validation splits.

Type:dict
init(self, experiment, logger, device)

Lazy-loading of skeltorch.Runner attributes.

Parameters:
  • experiment (skeltorch.Experiment) – Experiment object.
  • logger (logging.Logger) – Logger object.
  • device (str) – --device command argument.
init_model(self, device)

Initializes the model used in the project.

Creates and stores inside self.model the model to be used in the project. Use device to move the model to the proper device.

Parameters:device (str) – --device command argument.
init_optimizer(self, device)

Initializes the optimizer used in the project.

Creates and stores inside self.optimizer the optimizer to be used in the project. Use device to move the optimizer to the proper device, if required.

Parameters:device (str) – --device command argument.
init_others(self, device)

Initializes other objects used in the project.

Creates and stores other objects inside class attributes that may be required in the project. use device to move the objects to the proper device, if required.

Parameters:device (str) – --device command argument.
load_states(self, epoch, device)

Loads the states from the checkpoint associated with epoch.

Parameters:
  • epoch (int) – --epoch command argument.
  • device (str) – --device command argument.
load_states_others(self, checkpoint_data)

Loads the states of other objects from the checkpoint associated with epoch.

Parameters:checkpoint_data (dict) – Dictionary with the states of both default and other objects.
save_states(self)

Saves the states inside a checkpoint associated with epoch.

save_states_others(self)

Saves the states of other objects inside a checkpoint associated with epoch.

test(self, epoch, device)

Runs the test pipeline.

Parameters:
  • epoch (int or None) – --epoch command argument.
  • device (str) – --device command argument.
train(self, epoch, max_epochs, log_period, device)

Runs the train pipeline.

Implements a highly-customizable training/validation pipeline. In detail, the pipeline:

  1. Loads a checkpoint, if given. If not, tries to restore the last checkpoint or departs from scratch.
  2. Iterates for a maximum of max_epochs. In each epoch, the model extracts data from the loaders to train and validate the model.
  3. Propagates the data of each iteration using auxiliary method self.train_step().
  4. Saves a checkpoint at the end of the epoch.

In order to extend or modify the default behavior of the pipeline, several hooks are also provided:

  • self.train_before_epoch_tasks()
  • self.train_iteration_log()
  • self.train_epoch_log()
  • self.validation_iteration_log()
  • self.validation_epoch_log()
  • self.train_after_epoch_tasks()
  • self.train_early_stopping()
Parameters:
  • epoch (int or None) – --epoch command argument.
  • max_epochs (int) – --max-epochs command argument.
  • log_period (int) – --log-period command argument.
  • device (str) – --device command argument.
train_after_epoch_tasks(self, device)

Run at the end of an epoch.

By default, it logs a summary of the epoch using the logger.

Parameters:device (str) – --device command argument.
train_before_epoch_tasks(self, device)

Run at the beginning of an epoch.

By default, it logs an initializing message.

Parameters:device (str) – --device command argument.
train_early_stopping(self)

Run before starting a new epoch. Would True in case that the training should stop at the current epoch.

By default, it always returns False.

Returns:whether or not the training loop should stop at the current epoch.
Return type:bool
train_epoch_log(self, e_train_losses, device)

Run at the end of an epoch.

By default, it logs a small report of the epoch both using the logger and TensorBoard.

Parameters:
  • e_train_losses (list) – List containing all train losses of the epoch.
  • device (str) – --device command argument.
train_iteration_log(self, e_train_losses, log_period, device)

Run every log_period train iterations.

By default, it logs a small report of the last log_period train iterations both using the logger and TensorBoard.

Parameters:
  • e_train_losses (list) – List containing all train losses of the epoch.
  • log_period (int) – --log-period command argument.
  • device (str) – --device command argument.
train_step(self, it_data, device)

Performs training steps associated with one data iteration.

Parameters:
  • it_data (any) – output of the loader for the current iteration.
  • device (str) – --device command argument.
Returns:

measured value the loss.

Return type:

loss (float)

validation_epoch_log(self, e_validation_losses, device)

Run at the end of validation epoch.

By default, it logs a small report of the epoch both using the logger and TensorBoard.

Parameters:
  • e_validation_losses (list) – List containing all validation losses of the epoch.
  • device (str) – --device command argument.
validation_iteration_log(self, e_validation_losses, log_period, device)

Run every log_period validation iterations.

By default, it logs a small report of the last log_period validations iterations both using the logger and TensorBoard.

Parameters:
  • e_validation_losses (list) – List containing all validation losses of the epoch.
  • log_period (int) – --log-period command argument.
  • device (str) – --device command argument.