mattertune.main
Classes
|
|
|
|
|
|
|
The output of the MatterTuner.tune method. |
- class mattertune.main.TuneOutput(model, trainer)[source]
The output of the MatterTuner.tune method.
- Parameters:
model (FinetuneModuleBase)
trainer (Trainer)
- model: FinetuneModuleBase
The trained model.
- trainer: Trainer
The trainer used to train the model.
- class mattertune.main.TrainerConfig(*, accelerator='auto', strategy='auto', num_nodes=1, devices='auto', precision='32-true', deterministic=None, max_epochs=None, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, val_check_interval=None, check_val_every_n_epoch=1, log_every_n_steps=None, gradient_clip_val=None, gradient_clip_algorithm=None, checkpoint=None, early_stopping=None, loggers='default', additional_trainer_kwargs={})[source]
- Parameters:
accelerator (str)
strategy (str | Strategy)
num_nodes (int)
devices (list[int] | str | int)
precision (Literal[64, 32, 16] | ~typing.Literal['transformer-engine', 'transformer-engine-float16', '16-true', '16-mixed', 'bf16-true', 'bf16-mixed', '32-true', '64-true'] | ~typing.Literal['64', '32', '16', 'bf16'] | None)
deterministic (bool | Literal['warn'] | None)
max_epochs (int | None)
min_epochs (int | None)
max_steps (int)
min_steps (int | None)
max_time (str | timedelta | dict[str, int] | None)
val_check_interval (int | float | None)
check_val_every_n_epoch (int | None)
log_every_n_steps (int | None)
gradient_clip_val (int | float | None)
gradient_clip_algorithm (str | None)
checkpoint (ModelCheckpointConfig | None)
early_stopping (EarlyStoppingConfig | None)
loggers (Sequence[LoggerConfig] | Literal['default'])
additional_trainer_kwargs (dict[str, Any])
- accelerator: str
Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps”, “auto”) as well as custom accelerator instances.
- strategy: str | Strategy
Supports different training strategies with aliases as well custom strategies. Default:
"auto"
.
- num_nodes: int
Number of GPU nodes for distributed training. Default:
1
.
- devices: list[int] | str | int
The devices to use. Can be set to a sequence of device indices, “all” to indicate all available devices should be used, or
"auto"
for automatic selection based on the chosen accelerator. Default:"auto"
.
- precision: _PRECISION_INPUT | None
Double precision (64, ‘64’ or ‘64-true’), full precision (32, ‘32’ or ‘32-true’), 16bit mixed precision (16, ‘16’, ‘16-mixed’) or bfloat16 mixed precision (‘bf16’, ‘bf16-mixed’). Can be used on CPU, GPU, TPUs, HPUs or IPUs. Default:
'32-true'
.
- deterministic: bool | Literal['warn'] | None
If
True
, sets whether PyTorch operations must use deterministic algorithms. Set to"warn"
to use deterministic algorithms whenever possible, throwing warnings on operations that don’t support deterministic mode. If not set, defaults toFalse
. Default:None
.
- max_epochs: int | None
Stop training once this number of epochs is reached. Disabled by default (None). If both max_epochs and max_steps are not specified, defaults to
max_epochs = 1000
. To enable infinite training, setmax_epochs = -1
.
- min_epochs: int | None
Force training for at least these many epochs. Disabled by default (None).
- max_steps: int
Stop training after this number of steps. Disabled by default (-1). If
max_steps = -1
andmax_epochs = None
, will default tomax_epochs = 1000
. To enable infinite training, setmax_epochs
to-1
.
- min_steps: int | None
Force training for at least these number of steps. Disabled by default (
None
).
- max_time: str | timedelta | dict[str, int] | None
Stop training after this amount of time has passed. Disabled by default (
None
). The time duration can be specified in the format DD:HH:MM:SS (days, hours, minutes seconds), as adatetime.timedelta
, or a dictionary with keys that will be passed todatetime.timedelta
.
- val_check_interval: int | float | None
How often to check the validation set. Pass a
float
in the range [0.0, 1.0] to check after a fraction of the training epoch. Pass anint
to check after a fixed number of training batches. Anint
value can only be higher than the number of training batches whencheck_val_every_n_epoch=None
, which validates after everyN
training batches across epochs or during iteration-based training. Default:1.0
.
- check_val_every_n_epoch: int | None
Perform a validation loop every after every N training epochs. If
None
, validation will be done solely based on the number of training batches, requiringval_check_interval
to be an integer value. Default:1
.
- log_every_n_steps: int | None
How often to log within steps. Default:
50
.
- gradient_clip_val: int | float | None
The value at which to clip gradients. Passing
gradient_clip_val=None
disables gradient clipping. If using Automatic Mixed Precision (AMP), the gradients will be unscaled before. Default:None
.
- gradient_clip_algorithm: str | None
The gradient clipping algorithm to use. Pass
gradient_clip_algorithm="value"
to clip by value, andgradient_clip_algorithm="norm"
to clip by norm. By default it will be set to"norm"
.
- checkpoint: ModelCheckpointConfig | None
The configuration for the model checkpoint.
- early_stopping: EarlyStoppingConfig | None
The configuration for early stopping.
- loggers: Sequence[LoggerConfig] | Literal['default']
The loggers to use for logging training metrics.
If
"default"
, will use the CSV logger + the W&B logger if available. Default:"default"
.
- additional_trainer_kwargs: dict[str, Any]
Additional keyword arguments for the Lightning Trainer.
This is for advanced users who want to customize the Lightning Trainer, and is not recommended for beginners.
- class mattertune.main.MatterTunerConfig(*, data, model, trainer=TrainerConfig(accelerator='auto', strategy='auto', num_nodes=1, devices='auto', precision='32-true', deterministic=None, max_epochs=None, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, val_check_interval=None, check_val_every_n_epoch=1, log_every_n_steps=None, gradient_clip_val=None, gradient_clip_algorithm=None, checkpoint=None, early_stopping=None, loggers='default', additional_trainer_kwargs={}))[source]
- Parameters:
data (DataModuleConfig)
model (ModelConfig)
trainer (TrainerConfig)
- data: DataModuleConfig
The configuration for the data.
- model: ModelConfig
The configuration for the model.
- trainer: TrainerConfig
The configuration for the trainer.
- class mattertune.main.MatterTuner(config)[source]
- Parameters:
config (MatterTunerConfig)
- __init__(config)[source]
- Parameters:
config (MatterTunerConfig)