mattertune.configs.main
- class mattertune.configs.main.CSVLoggerConfig(*, type='csv', save_dir, name='lightning_logs', version=None, prefix='', flush_logs_every_n_steps=100)[source]
- Parameters:
type (Literal['csv'])
save_dir (str)
name (str)
version (int | str | None)
prefix (str)
flush_logs_every_n_steps (int)
- type: Literal['csv']
- save_dir: str
Save directory for logs.
- name: str
'lightning_logs'
.- Type:
Experiment name. Default
- version: int | str | None
Experiment version. If not specified, automatically assigns the next available version. Default:
None
.
- prefix: str
''
.- Type:
String to put at the beginning of metric keys. Default
- flush_logs_every_n_steps: int
100
.- Type:
How often to flush logs to disk. Default
- class mattertune.configs.main.EarlyStoppingConfig(*, monitor='val/total_loss', min_delta=0.0, patience=3, verbose=False, mode='min', strict=True, check_finite=True, stopping_threshold=None, divergence_threshold=None, check_on_train_epoch_end=None, log_rank_zero_only=False)[source]
- Parameters:
monitor (str)
min_delta (float)
patience (int)
verbose (bool)
mode (Literal['min', 'max'])
strict (bool)
check_finite (bool)
stopping_threshold (float | None)
divergence_threshold (float | None)
check_on_train_epoch_end (bool | None)
log_rank_zero_only (bool)
- monitor: str
Quantity to be monitored.
- min_delta: float
Minimum change in monitored quantity to qualify as an improvement. Changes of less than or equal to min_delta will count as no improvement. Default:
0.0
.
- patience: int
3
.- Type:
Number of validation checks with no improvement after which training will be stopped. Default
- verbose: bool
False
.- Type:
Whether to print messages when improvement is found or early stopping is triggered. Default
- mode: Literal['min', 'max']
One of ‘min’ or ‘max’. In ‘min’ mode, training stops when monitored quantity stops decreasing; in ‘max’ mode it stops when the quantity stops increasing. Default:
'min'
.
- strict: bool
True
.- Type:
Whether to raise an error if monitored metric is not found in validation metrics. Default
- check_finite: bool
True
.- Type:
Whether to stop training when the monitor becomes NaN or infinite. Default
- stopping_threshold: float | None
None
.- Type:
Stop training immediately once the monitored quantity reaches this threshold. Default
- divergence_threshold: float | None
None
.- Type:
Stop training as soon as the monitored quantity becomes worse than this threshold. Default
- check_on_train_epoch_end: bool | None
Whether to run early stopping at the end of training epoch. If False, check runs at validation end. Default:
None
.
- log_rank_zero_only: bool
False
.- Type:
Whether to log the status of early stopping only for rank 0 process. Default
- class mattertune.configs.main.MatterTunerConfig(*, data, model, trainer=TrainerConfig(accelerator='auto', strategy='auto', num_nodes=1, devices='auto', precision='32-true', deterministic=None, max_epochs=None, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, val_check_interval=None, check_val_every_n_epoch=1, log_every_n_steps=None, gradient_clip_val=None, gradient_clip_algorithm=None, checkpoint=None, early_stopping=None, loggers='default', additional_trainer_kwargs={}))[source]
- Parameters:
data (DataModuleConfig)
model (ModelConfig)
trainer (TrainerConfig)
- data: DataModuleConfig
The configuration for the data.
- model: ModelConfig
The configuration for the model.
- trainer: TrainerConfig
The configuration for the trainer.
- class mattertune.configs.main.ModelCheckpointConfig(*, dirpath=None, filename=None, monitor=None, verbose=False, save_last=None, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None, enable_version_counter=True)[source]
- Parameters:
dirpath (str | None)
filename (str | None)
monitor (str | None)
verbose (bool)
save_last (Literal[True, False, 'link'] | None)
save_top_k (int)
save_weights_only (bool)
mode (Literal['min', 'max'])
auto_insert_metric_name (bool)
every_n_train_steps (int | None)
train_time_interval (timedelta | None)
every_n_epochs (int | None)
save_on_train_epoch_end (bool | None)
enable_version_counter (bool)
- dirpath: str | None
None
.- Type:
Directory to save the model file. Default
- filename: str | None
None
.- Type:
Checkpoint filename. Can contain named formatting options. Default
- monitor: str | None
None
.- Type:
Quantity to monitor. Default
- verbose: bool
False
.- Type:
Verbosity mode. Default
- save_last: Literal[True, False, 'link'] | None
None
.- Type:
When True or “link”, saves a ‘last.ckpt’ checkpoint when a checkpoint is saved. Default
- save_top_k: int
1
.- Type:
If save_top_k=k, save k models with best monitored quantity. Default
- save_weights_only: bool
False
.- Type:
If True, only save model weights. Default
- mode: Literal['min', 'max']
'min'
.- Type:
One of {‘min’, ‘max’}. For ‘min’ training stops when monitored quantity stops decreasing. Default
- auto_insert_metric_name: bool
True
.- Type:
Whether to automatically insert metric name in checkpoint filename. Default
- every_n_train_steps: int | None
None
.- Type:
Number of training steps between checkpoints. Default
- train_time_interval: timedelta | None
None
.- Type:
Checkpoints are monitored at the specified time interval. Default
- every_n_epochs: int | None
None
.- Type:
Number of epochs between checkpoints. Default
- save_on_train_epoch_end: bool | None
None
.- Type:
Whether to run checkpointing at end of training epoch. Default
- enable_version_counter: bool
True
.- Type:
Whether to append version to existing filenames. Default
- class mattertune.configs.main.TrainerConfig(*, accelerator='auto', strategy='auto', num_nodes=1, devices='auto', precision='32-true', deterministic=None, max_epochs=None, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, val_check_interval=None, check_val_every_n_epoch=1, log_every_n_steps=None, gradient_clip_val=None, gradient_clip_algorithm=None, checkpoint=None, early_stopping=None, loggers='default', additional_trainer_kwargs={})[source]
- Parameters:
accelerator (str)
strategy (str | Strategy)
num_nodes (int)
devices (list[int] | str | int)
precision (Literal[64, 32, 16] | ~typing.Literal['transformer-engine', 'transformer-engine-float16', '16-true', '16-mixed', 'bf16-true', 'bf16-mixed', '32-true', '64-true'] | ~typing.Literal['64', '32', '16', 'bf16'] | None)
deterministic (bool | Literal['warn'] | None)
max_epochs (int | None)
min_epochs (int | None)
max_steps (int)
min_steps (int | None)
max_time (str | timedelta | dict[str, int] | None)
val_check_interval (int | float | None)
check_val_every_n_epoch (int | None)
log_every_n_steps (int | None)
gradient_clip_val (int | float | None)
gradient_clip_algorithm (str | None)
checkpoint (ModelCheckpointConfig | None)
early_stopping (EarlyStoppingConfig | None)
loggers (Sequence[LoggerConfig] | Literal['default'])
additional_trainer_kwargs (dict[str, Any])
- accelerator: str
Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps”, “auto”) as well as custom accelerator instances.
- strategy: str | Strategy
Supports different training strategies with aliases as well custom strategies. Default:
"auto"
.
- num_nodes: int
Number of GPU nodes for distributed training. Default:
1
.
- devices: list[int] | str | int
The devices to use. Can be set to a sequence of device indices, “all” to indicate all available devices should be used, or
"auto"
for automatic selection based on the chosen accelerator. Default:"auto"
.
- precision: _PRECISION_INPUT | None
Double precision (64, ‘64’ or ‘64-true’), full precision (32, ‘32’ or ‘32-true’), 16bit mixed precision (16, ‘16’, ‘16-mixed’) or bfloat16 mixed precision (‘bf16’, ‘bf16-mixed’). Can be used on CPU, GPU, TPUs, HPUs or IPUs. Default:
'32-true'
.
- deterministic: bool | Literal['warn'] | None
If
True
, sets whether PyTorch operations must use deterministic algorithms. Set to"warn"
to use deterministic algorithms whenever possible, throwing warnings on operations that don’t support deterministic mode. If not set, defaults toFalse
. Default:None
.
- max_epochs: int | None
Stop training once this number of epochs is reached. Disabled by default (None). If both max_epochs and max_steps are not specified, defaults to
max_epochs = 1000
. To enable infinite training, setmax_epochs = -1
.
- min_epochs: int | None
Force training for at least these many epochs. Disabled by default (None).
- max_steps: int
Stop training after this number of steps. Disabled by default (-1). If
max_steps = -1
andmax_epochs = None
, will default tomax_epochs = 1000
. To enable infinite training, setmax_epochs
to-1
.
- min_steps: int | None
Force training for at least these number of steps. Disabled by default (
None
).
- max_time: str | timedelta | dict[str, int] | None
Stop training after this amount of time has passed. Disabled by default (
None
). The time duration can be specified in the format DD:HH:MM:SS (days, hours, minutes seconds), as adatetime.timedelta
, or a dictionary with keys that will be passed todatetime.timedelta
.
- val_check_interval: int | float | None
How often to check the validation set. Pass a
float
in the range [0.0, 1.0] to check after a fraction of the training epoch. Pass anint
to check after a fixed number of training batches. Anint
value can only be higher than the number of training batches whencheck_val_every_n_epoch=None
, which validates after everyN
training batches across epochs or during iteration-based training. Default:1.0
.
- check_val_every_n_epoch: int | None
Perform a validation loop every after every N training epochs. If
None
, validation will be done solely based on the number of training batches, requiringval_check_interval
to be an integer value. Default:1
.
- log_every_n_steps: int | None
How often to log within steps. Default:
50
.
- gradient_clip_val: int | float | None
The value at which to clip gradients. Passing
gradient_clip_val=None
disables gradient clipping. If using Automatic Mixed Precision (AMP), the gradients will be unscaled before. Default:None
.
- gradient_clip_algorithm: str | None
The gradient clipping algorithm to use. Pass
gradient_clip_algorithm="value"
to clip by value, andgradient_clip_algorithm="norm"
to clip by norm. By default it will be set to"norm"
.
- checkpoint: ModelCheckpointConfig | None
The configuration for the model checkpoint.
- early_stopping: EarlyStoppingConfig | None
The configuration for early stopping.
- loggers: Sequence[LoggerConfig] | Literal['default']
The loggers to use for logging training metrics.
If
"default"
, will use the CSV logger + the W&B logger if available. Default:"default"
.
- additional_trainer_kwargs: dict[str, Any]
Additional keyword arguments for the Lightning Trainer.
This is for advanced users who want to customize the Lightning Trainer, and is not recommended for beginners.