mattertune.finetune.optimizer

Functions

create_optimizer(config, named_parameters)

Classes

AdamConfig(*[, per_parameter_hparams, name, ...])

AdamWConfig(*[, per_parameter_hparams, ...])

OptimizerConfigBase(*[, per_parameter_hparams])

PerParamHparamsDict

SGDConfig(*[, per_parameter_hparams, name, ...])

class mattertune.finetune.optimizer.PerParamHparamsDict[source]
patterns: Sequence[str]

Patterns to match parameter names.

hparams: dict[str, Any]

Hyperparameters for the matched parameters.

optimize: NotRequired[bool]

Whether to optimize this parameter. Default is True.

class mattertune.finetune.optimizer.OptimizerConfigBase(*, per_parameter_hparams=None)[source]
Parameters:

per_parameter_hparams (Sequence[PerParamHparamsDict] | None)

per_parameter_hparams: Sequence[PerParamHparamsDict] | None

Per parameter hyperparameters.

This should be a list of dictionaries, each of which has the following keys:

  • patterns: a list of patterns to match parameter names.

  • hparams: a dictionary of hyperparameters for the matched parameters.

  • optimize: whether to optimize this parameter. Default is True.

This allows you to, for example, set different learning rates for different parameters.

class mattertune.finetune.optimizer.AdamConfig(*, per_parameter_hparams=None, name='Adam', lr, eps=1e-08, betas=(0.9, 0.999), weight_decay=0.0, amsgrad=False)[source]
Parameters:
  • per_parameter_hparams (Sequence[PerParamHparamsDict] | None)

  • name (Literal['Adam'])

  • lr (Annotated[float, Gt(gt=0)])

  • eps (Annotated[float, Ge(ge=0)])

  • betas (tuple[Annotated[float, Gt(gt=0)], Annotated[float, Gt(gt=0)]])

  • weight_decay (Annotated[float, Ge(ge=0)])

  • amsgrad (bool)

name: Literal['Adam']

name of the optimizer.

lr: C.PositiveFloat

Learning rate.

eps: C.NonNegativeFloat

Epsilon.

betas: tuple[C.PositiveFloat, C.PositiveFloat]

Betas.

weight_decay: C.NonNegativeFloat

Weight decay.

amsgrad: bool

Whether to use AMSGrad variant of Adam.

per_parameter_hparams: Sequence[PerParamHparamsDict] | None

Per parameter hyperparameters.

This should be a list of dictionaries, each of which has the following keys:

  • patterns: a list of patterns to match parameter names.

  • hparams: a dictionary of hyperparameters for the matched parameters.

  • optimize: whether to optimize this parameter. Default is True.

This allows you to, for example, set different learning rates for different parameters.

class mattertune.finetune.optimizer.AdamWConfig(*, per_parameter_hparams=None, name='AdamW', lr, eps=1e-08, betas=(0.9, 0.999), weight_decay=0.01, amsgrad=False)[source]
Parameters:
  • per_parameter_hparams (Sequence[PerParamHparamsDict] | None)

  • name (Literal['AdamW'])

  • lr (Annotated[float, Gt(gt=0)])

  • eps (Annotated[float, Ge(ge=0)])

  • betas (tuple[Annotated[float, Gt(gt=0)], Annotated[float, Gt(gt=0)]])

  • weight_decay (Annotated[float, Ge(ge=0)])

  • amsgrad (bool)

name: Literal['AdamW']

name of the optimizer.

lr: C.PositiveFloat

Learning rate.

eps: C.NonNegativeFloat

Epsilon.

per_parameter_hparams: Sequence[PerParamHparamsDict] | None

Per parameter hyperparameters.

This should be a list of dictionaries, each of which has the following keys:

  • patterns: a list of patterns to match parameter names.

  • hparams: a dictionary of hyperparameters for the matched parameters.

  • optimize: whether to optimize this parameter. Default is True.

This allows you to, for example, set different learning rates for different parameters.

betas: tuple[C.PositiveFloat, C.PositiveFloat]

Betas.

weight_decay: C.NonNegativeFloat

Weight decay.

amsgrad: bool

Whether to use AMSGrad variant of Adam.

class mattertune.finetune.optimizer.SGDConfig(*, per_parameter_hparams=None, name='SGD', lr, momentum=0.0, weight_decay=0.0, nestrov=False)[source]
Parameters:
  • per_parameter_hparams (Sequence[PerParamHparamsDict] | None)

  • name (Literal['SGD'])

  • lr (Annotated[float, Gt(gt=0)])

  • momentum (Annotated[float, Ge(ge=0)])

  • weight_decay (Annotated[float, Ge(ge=0)])

  • nestrov (bool)

per_parameter_hparams: Sequence[PerParamHparamsDict] | None

Per parameter hyperparameters.

This should be a list of dictionaries, each of which has the following keys:

  • patterns: a list of patterns to match parameter names.

  • hparams: a dictionary of hyperparameters for the matched parameters.

  • optimize: whether to optimize this parameter. Default is True.

This allows you to, for example, set different learning rates for different parameters.

name: Literal['SGD']

name of the optimizer.

lr: C.PositiveFloat

Learning rate.

momentum: C.NonNegativeFloat

Momentum.

weight_decay: C.NonNegativeFloat

Weight decay.

nestrov: bool

Whether to use nestrov.

mattertune.finetune.optimizer.create_optimizer(config, named_parameters)[source]
Parameters:
  • config (OptimizerConfig)

  • named_parameters (Iterable[tuple[str, Parameter]])

Return type:

Optimizer