mattertune.configs.callbacks

class mattertune.configs.callbacks.EarlyStoppingConfig(*, monitor='val/total_loss', min_delta=0.0, patience=3, verbose=False, mode='min', strict=True, check_finite=True, stopping_threshold=None, divergence_threshold=None, check_on_train_epoch_end=None, log_rank_zero_only=False)[source]
Parameters:
  • monitor (str)

  • min_delta (float)

  • patience (int)

  • verbose (bool)

  • mode (Literal['min', 'max'])

  • strict (bool)

  • check_finite (bool)

  • stopping_threshold (float | None)

  • divergence_threshold (float | None)

  • check_on_train_epoch_end (bool | None)

  • log_rank_zero_only (bool)

monitor: str

Quantity to be monitored.

min_delta: float

Minimum change in monitored quantity to qualify as an improvement. Changes of less than or equal to min_delta will count as no improvement. Default: 0.0.

patience: int

3.

Type:

Number of validation checks with no improvement after which training will be stopped. Default

verbose: bool

False.

Type:

Whether to print messages when improvement is found or early stopping is triggered. Default

mode: Literal['min', 'max']

One of ‘min’ or ‘max’. In ‘min’ mode, training stops when monitored quantity stops decreasing; in ‘max’ mode it stops when the quantity stops increasing. Default: 'min'.

strict: bool

True.

Type:

Whether to raise an error if monitored metric is not found in validation metrics. Default

check_finite: bool

True.

Type:

Whether to stop training when the monitor becomes NaN or infinite. Default

stopping_threshold: float | None

None.

Type:

Stop training immediately once the monitored quantity reaches this threshold. Default

divergence_threshold: float | None

None.

Type:

Stop training as soon as the monitored quantity becomes worse than this threshold. Default

check_on_train_epoch_end: bool | None

Whether to run early stopping at the end of training epoch. If False, check runs at validation end. Default: None.

log_rank_zero_only: bool

False.

Type:

Whether to log the status of early stopping only for rank 0 process. Default

create_callback()[source]
class mattertune.configs.callbacks.ModelCheckpointConfig(*, dirpath=None, filename=None, monitor=None, verbose=False, save_last=None, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None, enable_version_counter=True)[source]
Parameters:
  • dirpath (str | None)

  • filename (str | None)

  • monitor (str | None)

  • verbose (bool)

  • save_last (Literal[True, False, 'link'] | None)

  • save_top_k (int)

  • save_weights_only (bool)

  • mode (Literal['min', 'max'])

  • auto_insert_metric_name (bool)

  • every_n_train_steps (int | None)

  • train_time_interval (timedelta | None)

  • every_n_epochs (int | None)

  • save_on_train_epoch_end (bool | None)

  • enable_version_counter (bool)

dirpath: str | None

None.

Type:

Directory to save the model file. Default

filename: str | None

None.

Type:

Checkpoint filename. Can contain named formatting options. Default

monitor: str | None

None.

Type:

Quantity to monitor. Default

verbose: bool

False.

Type:

Verbosity mode. Default

save_last: Literal[True, False, 'link'] | None

None.

Type:

When True or “link”, saves a ‘last.ckpt’ checkpoint when a checkpoint is saved. Default

save_top_k: int

1.

Type:

If save_top_k=k, save k models with best monitored quantity. Default

save_weights_only: bool

False.

Type:

If True, only save model weights. Default

mode: Literal['min', 'max']

'min'.

Type:

One of {‘min’, ‘max’}. For ‘min’ training stops when monitored quantity stops decreasing. Default

auto_insert_metric_name: bool

True.

Type:

Whether to automatically insert metric name in checkpoint filename. Default

every_n_train_steps: int | None

None.

Type:

Number of training steps between checkpoints. Default

train_time_interval: timedelta | None

None.

Type:

Checkpoints are monitored at the specified time interval. Default

every_n_epochs: int | None

None.

Type:

Number of epochs between checkpoints. Default

save_on_train_epoch_end: bool | None

None.

Type:

Whether to run checkpointing at end of training epoch. Default

enable_version_counter: bool

True.

Type:

Whether to append version to existing filenames. Default

create_callback()[source]

Creates a ModelCheckpoint callback instance from this config.

Modules

early_stopping

model_checkpoint