mattertune.configs.main

class mattertune.configs.main.CSVLoggerConfig(*, type='csv', save_dir, name='lightning_logs', version=None, prefix='', flush_logs_every_n_steps=100)[source]

Parameters:

type (Literal['csv'])
save_dir (str)
name (str)
version (int | str | None)
prefix (str)
flush_logs_every_n_steps (int)

type: Literal['csv']

save_dir: str: Save directory for logs.

name: str

'lightning_logs'.

Type:: Experiment name. Default

version: int | str | None: Experiment version. If not specified, automatically assigns the next available version. Default: None.

prefix: str

''.

Type:: String to put at the beginning of metric keys. Default

flush_logs_every_n_steps: int

100.

Type:: How often to flush logs to disk. Default

create_logger()[source]: Creates a CSVLogger instance from this config.

class mattertune.configs.main.EMAConfig(*, decay, validate_original_weights=False, every_n_steps=1, cpu_offload=False)[source]

Parameters:

decay (float)
validate_original_weights (bool)
every_n_steps (int)
cpu_offload (bool)

decay: float

validate_original_weights: bool

every_n_steps: int

cpu_offload: bool

construct_callback()[source]

class mattertune.configs.main.EarlyStoppingConfig(*, monitor='val/total_loss', min_delta=0.0, patience=3, verbose=False, mode='min', strict=True, check_finite=True, stopping_threshold=None, divergence_threshold=None, check_on_train_epoch_end=None, log_rank_zero_only=False)[source]

Parameters:

monitor (str)
min_delta (float)
patience (int)
verbose (bool)
mode (Literal['min', 'max'])
strict (bool)
check_finite (bool)
stopping_threshold (float | None)
divergence_threshold (float | None)
check_on_train_epoch_end (bool | None)
log_rank_zero_only (bool)

monitor: str: Quantity to be monitored.

min_delta: float: Minimum change in monitored quantity to qualify as an improvement. Changes of less than or equal to min_delta will count as no improvement. Default: 0.0.

patience: int

3.

Type:: Number of validation checks with no improvement after which training will be stopped. Default

verbose: bool

False.

Type:: Whether to print messages when improvement is found or early stopping is triggered. Default

mode: Literal['min', 'max']: One of ‘min’ or ‘max’. In ‘min’ mode, training stops when monitored quantity stops decreasing; in ‘max’ mode it stops when the quantity stops increasing. Default: 'min'.

strict: bool

True.

Type:: Whether to raise an error if monitored metric is not found in validation metrics. Default

check_finite: bool

True.

Type:: Whether to stop training when the monitor becomes NaN or infinite. Default

stopping_threshold: float | None

None.

Type:: Stop training immediately once the monitored quantity reaches this threshold. Default

divergence_threshold: float | None

None.

Type:: Stop training as soon as the monitored quantity becomes worse than this threshold. Default

check_on_train_epoch_end: bool | None: Whether to run early stopping at the end of training epoch. If False, check runs at validation end. Default: None.

log_rank_zero_only: bool

False.

Type:: Whether to log the status of early stopping only for rank 0 process. Default

create_callback()[source]

class mattertune.configs.main.MatterTunerConfig(*, data, model, trainer=TrainerConfig(accelerator='auto', strategy='auto', num_nodes=1, devices='auto', precision='32-true', deterministic=None, max_epochs=None, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, val_check_interval=None, check_val_every_n_epoch=1, log_every_n_steps=None, gradient_clip_val=None, gradient_clip_algorithm=None, checkpoint=None, early_stopping=None, ema=None, loggers='default', additional_trainer_kwargs={}), recipes=[])[source]

Parameters:

data (DataModuleConfig)
model (ModelConfig)
trainer (TrainerConfig)
recipes (Sequence[RecipeConfig])

data: DataModuleConfig: The configuration for the data.

model: ModelConfig: The configuration for the model.

trainer: TrainerConfig: The configuration for the trainer.

recipes: Sequence[RecipeConfig]

Recipes to modify the training process.

Recipes are configurable components that can modify how models are trained. Each recipe provides a specific capability like parameter-efficient fine-tuning, regularization, or advanced optimization techniques.

Recipes are applied in order when training starts. Multiple recipes can be combined to achieve the desired training behavior.

Examples

```python # Use LoRA for memory-efficient training recipes=[

LoRARecipeConfig(
lora=LoraConfig(r=8, target_modules=[“linear1”])

)

]

class mattertune.configs.main.ModelCheckpointConfig(*, dirpath=None, filename=None, monitor=None, verbose=False, save_last=None, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None, enable_version_counter=True)[source]

Parameters:

dirpath (str | None)
filename (str | None)
monitor (str | None)
verbose (bool)
save_last (Literal[True, False, 'link'] | None)
save_top_k (int)
save_weights_only (bool)
mode (Literal['min', 'max'])
auto_insert_metric_name (bool)
every_n_train_steps (int | None)
train_time_interval (timedelta | None)
every_n_epochs (int | None)
save_on_train_epoch_end (bool | None)
enable_version_counter (bool)

dirpath: str | None

None.

Type:: Directory to save the model file. Default

filename: str | None

None.

Type:: Checkpoint filename. Can contain named formatting options. Default

monitor: str | None

None.

Type:: Quantity to monitor. Default

verbose: bool

False.

Type:: Verbosity mode. Default

save_last: Literal[True, False, 'link'] | None

None.

Type:: When True or “link”, saves a ‘last.ckpt’ checkpoint when a checkpoint is saved. Default

save_top_k: int

1.

Type:: If save_top_k=k, save k models with best monitored quantity. Default

save_weights_only: bool

False.

Type:: If True, only save model weights. Default

mode: Literal['min', 'max']

'min'.

Type:: One of {‘min’, ‘max’}. For ‘min’ training stops when monitored quantity stops decreasing. Default

auto_insert_metric_name: bool

True.

Type:: Whether to automatically insert metric name in checkpoint filename. Default

every_n_train_steps: int | None

None.

Type:: Number of training steps between checkpoints. Default

train_time_interval: timedelta | None

None.

Type:: Checkpoints are monitored at the specified time interval. Default

every_n_epochs: int | None

None.

Type:: Number of epochs between checkpoints. Default

save_on_train_epoch_end: bool | None

None.

Type:: Whether to run checkpointing at end of training epoch. Default

enable_version_counter: bool

True.

Type:: Whether to append version to existing filenames. Default

create_callback()[source]: Creates a ModelCheckpoint callback instance from this config.

class mattertune.configs.main.TrainerConfig(*, accelerator='auto', strategy='auto', num_nodes=1, devices='auto', precision='32-true', deterministic=None, max_epochs=None, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, val_check_interval=None, check_val_every_n_epoch=1, log_every_n_steps=None, gradient_clip_val=None, gradient_clip_algorithm=None, checkpoint=None, early_stopping=None, ema=None, loggers='default', additional_trainer_kwargs={})[source]

Parameters:

accelerator (str)
strategy (str | Strategy)
num_nodes (int)
devices (list[int] | str | int)
precision (Literal[64, 32, 16] | ~typing.Literal['transformer-engine', 'transformer-engine-float16', '16-true', '16-mixed', 'bf16-true', 'bf16-mixed', '32-true', '64-true'] | ~typing.Literal['64', '32', '16', 'bf16'] | None)
deterministic (bool | Literal['warn'] | None)
max_epochs (int | None)
min_epochs (int | None)
max_steps (int)
min_steps (int | None)
max_time (str | timedelta | dict[str, int] | None)
val_check_interval (int | float | None)
check_val_every_n_epoch (int | None)
log_every_n_steps (int | None)
gradient_clip_val (int | float | None)
gradient_clip_algorithm (str | None)
checkpoint (ModelCheckpointConfig | None)
early_stopping (EarlyStoppingConfig | None)
ema (EMAConfig | None)
loggers (Sequence[LoggerConfig] | Literal['default'])
additional_trainer_kwargs (dict[str, Any])

accelerator: str: Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps”, “auto”) as well as custom accelerator instances.

strategy: str | Strategy: Supports different training strategies with aliases as well custom strategies. Default: "auto".

num_nodes: int: Number of GPU nodes for distributed training. Default: 1.

devices: list[int] | str | int: The devices to use. Can be set to a sequence of device indices, “all” to indicate all available devices should be used, or "auto" for automatic selection based on the chosen accelerator. Default: "auto".

precision: _PRECISION_INPUT | None: Double precision (64, ‘64’ or ‘64-true’), full precision (32, ‘32’ or ‘32-true’), 16bit mixed precision (16, ‘16’, ‘16-mixed’) or bfloat16 mixed precision (‘bf16’, ‘bf16-mixed’). Can be used on CPU, GPU, TPUs, HPUs or IPUs. Default: '32-true'.

deterministic: bool | Literal['warn'] | None: If True, sets whether PyTorch operations must use deterministic algorithms. Set to "warn" to use deterministic algorithms whenever possible, throwing warnings on operations that don’t support deterministic mode. If not set, defaults to False. Default: None.

max_epochs: int | None: Stop training once this number of epochs is reached. Disabled by default (None). If both max_epochs and max_steps are not specified, defaults to max_epochs = 1000. To enable infinite training, set max_epochs = -1.

min_epochs: int | None: Force training for at least these many epochs. Disabled by default (None).

max_steps: int: Stop training after this number of steps. Disabled by default (-1). If max_steps = -1 and max_epochs = None, will default to max_epochs = 1000. To enable infinite training, set max_epochs to -1.

min_steps: int | None: Force training for at least these number of steps. Disabled by default (None).

max_time: str | timedelta | dict[str, int] | None: Stop training after this amount of time has passed. Disabled by default (None). The time duration can be specified in the format DD:HH:MM:SS (days, hours, minutes seconds), as a datetime.timedelta, or a dictionary with keys that will be passed to datetime.timedelta.

val_check_interval: int | float | None: How often to check the validation set. Pass a float in the range [0.0, 1.0] to check after a fraction of the training epoch. Pass an int to check after a fixed number of training batches. An int value can only be higher than the number of training batches when check_val_every_n_epoch=None, which validates after every N training batches across epochs or during iteration-based training. Default: 1.0.

check_val_every_n_epoch: int | None: Perform a validation loop every after every N training epochs. If None, validation will be done solely based on the number of training batches, requiring val_check_interval to be an integer value. Default: 1.

log_every_n_steps: int | None: How often to log within steps. Default: 50.

gradient_clip_val: int | float | None: The value at which to clip gradients. Passing gradient_clip_val=None disables gradient clipping. If using Automatic Mixed Precision (AMP), the gradients will be unscaled before. Default: None.

gradient_clip_algorithm: str | None: The gradient clipping algorithm to use. Pass gradient_clip_algorithm="value" to clip by value, and gradient_clip_algorithm="norm" to clip by norm. By default it will be set to "norm".

checkpoint: ModelCheckpointConfig | None: The configuration for the model checkpoint.

early_stopping: EarlyStoppingConfig | None: The configuration for early stopping.

ema: EMAConfig | None: The configuration for the Exponential Moving Average (EMA) callback.

loggers: Sequence[LoggerConfig] | Literal['default']

The loggers to use for logging training metrics.

If "default", will use the CSV logger + the W&B logger if available. Default: "default".

additional_trainer_kwargs: dict[str, Any]

Additional keyword arguments for the Lightning Trainer.

This is for advanced users who want to customize the Lightning Trainer, and is not recommended for beginners.