mattertune.finetune.optimizer
Functions
|
Classes
|
|
|
|
|
|
|
- class mattertune.finetune.optimizer.PerParamHparamsDict[source]
- patterns: Sequence[str]
Patterns to match parameter names.
- hparams: dict[str, Any]
Hyperparameters for the matched parameters.
- optimize: NotRequired[bool]
Whether to optimize this parameter. Default is True.
- class mattertune.finetune.optimizer.OptimizerConfigBase(*, per_parameter_hparams=None)[source]
- Parameters:
per_parameter_hparams (Sequence[PerParamHparamsDict] | None)
- per_parameter_hparams: Sequence[PerParamHparamsDict] | None
Per parameter hyperparameters.
This should be a list of dictionaries, each of which has the following keys:
patterns: a list of patterns to match parameter names.
hparams: a dictionary of hyperparameters for the matched parameters.
optimize: whether to optimize this parameter. Default is True.
This allows you to, for example, set different learning rates for different parameters.
- class mattertune.finetune.optimizer.AdamConfig(*, per_parameter_hparams=None, name='Adam', lr, eps=1e-08, betas=(0.9, 0.999), weight_decay=0.0, amsgrad=False)[source]
- Parameters:
per_parameter_hparams (Sequence[PerParamHparamsDict] | None)
name (Literal['Adam'])
lr (Annotated[float, Gt(gt=0)])
eps (Annotated[float, Ge(ge=0)])
betas (tuple[Annotated[float, Gt(gt=0)], Annotated[float, Gt(gt=0)]])
weight_decay (Annotated[float, Ge(ge=0)])
amsgrad (bool)
- name: Literal['Adam']
name of the optimizer.
- lr: C.PositiveFloat
Learning rate.
- eps: C.NonNegativeFloat
Epsilon.
- betas: tuple[C.PositiveFloat, C.PositiveFloat]
Betas.
- weight_decay: C.NonNegativeFloat
Weight decay.
- amsgrad: bool
Whether to use AMSGrad variant of Adam.
- per_parameter_hparams: Sequence[PerParamHparamsDict] | None
Per parameter hyperparameters.
This should be a list of dictionaries, each of which has the following keys:
patterns: a list of patterns to match parameter names.
hparams: a dictionary of hyperparameters for the matched parameters.
optimize: whether to optimize this parameter. Default is True.
This allows you to, for example, set different learning rates for different parameters.
- class mattertune.finetune.optimizer.AdamWConfig(*, per_parameter_hparams=None, name='AdamW', lr, eps=1e-08, betas=(0.9, 0.999), weight_decay=0.01, amsgrad=False)[source]
- Parameters:
per_parameter_hparams (Sequence[PerParamHparamsDict] | None)
name (Literal['AdamW'])
lr (Annotated[float, Gt(gt=0)])
eps (Annotated[float, Ge(ge=0)])
betas (tuple[Annotated[float, Gt(gt=0)], Annotated[float, Gt(gt=0)]])
weight_decay (Annotated[float, Ge(ge=0)])
amsgrad (bool)
- name: Literal['AdamW']
name of the optimizer.
- lr: C.PositiveFloat
Learning rate.
- eps: C.NonNegativeFloat
Epsilon.
- per_parameter_hparams: Sequence[PerParamHparamsDict] | None
Per parameter hyperparameters.
This should be a list of dictionaries, each of which has the following keys:
patterns: a list of patterns to match parameter names.
hparams: a dictionary of hyperparameters for the matched parameters.
optimize: whether to optimize this parameter. Default is True.
This allows you to, for example, set different learning rates for different parameters.
- betas: tuple[C.PositiveFloat, C.PositiveFloat]
Betas.
- weight_decay: C.NonNegativeFloat
Weight decay.
- amsgrad: bool
Whether to use AMSGrad variant of Adam.
- class mattertune.finetune.optimizer.SGDConfig(*, per_parameter_hparams=None, name='SGD', lr, momentum=0.0, weight_decay=0.0, nestrov=False)[source]
- Parameters:
per_parameter_hparams (Sequence[PerParamHparamsDict] | None)
name (Literal['SGD'])
lr (Annotated[float, Gt(gt=0)])
momentum (Annotated[float, Ge(ge=0)])
weight_decay (Annotated[float, Ge(ge=0)])
nestrov (bool)
- per_parameter_hparams: Sequence[PerParamHparamsDict] | None
Per parameter hyperparameters.
This should be a list of dictionaries, each of which has the following keys:
patterns: a list of patterns to match parameter names.
hparams: a dictionary of hyperparameters for the matched parameters.
optimize: whether to optimize this parameter. Default is True.
This allows you to, for example, set different learning rates for different parameters.
- name: Literal['SGD']
name of the optimizer.
- lr: C.PositiveFloat
Learning rate.
- momentum: C.NonNegativeFloat
Momentum.
- weight_decay: C.NonNegativeFloat
Weight decay.
- nestrov: bool
Whether to use nestrov.