Normalization
MatterTune provides flexible property normalization capabilities through the mattertune.normalization
module. Normalization is crucial for improving training stability and convergence when fine-tuning models, especially for properties that can vary widely in scale.
Overview
The normalization system consists of:
A
NormalizationContext
that provides per-batch information needed for normalizationMultiple normalizer types that can be composed together
CLI tools for computing normalization parameters from datasets
Supported Normalizers
Mean-Standard Deviation Normalization
Normalizes values using mean and standard deviation: (x - mean) / std
config = mt.configs.MatterTunerConfig(
model=mt.configs.JMPBackboneConfig(
# ... other configs ...
normalizers={
"energy": [
mt.configs.MeanStdNormalizerConfig(
mean=-13.6, # mean of your property
std=2.4 # standard deviation
)
]
}
),
# ... other configs ...
)
RMS Normalization
Normalizes values by dividing by the root mean square value: x / rms
config = mt.configs.MatterTunerConfig(
model=mt.configs.JMPBackboneConfig(
# ... other configs ...
normalizers={
"forces": [
mt.configs.RMSNormalizerConfig(
rms=2.5 # RMS value of your property
)
]
}
),
# ... other configs ...
)
Per-Atom Reference Normalization
Subtracts composition-weighted atomic reference values. This is particularly useful for energy predictions where you want to remove the baseline atomic contributions.
config = mt.configs.MatterTunerConfig(
model=mt.configs.JMPBackboneConfig(
# ... other configs ...
normalizers={
"energy": [
mt.configs.PerAtomReferencingNormalizerConfig(
# Option 1: Direct dictionary mapping
per_atom_references={
1: -13.6, # H
8: -2000.0 # O
}
# Option 2: List indexed by atomic number
# per_atom_references=[0.0, -13.6, 0.0, ..., -2000.0]
# Option 3: Path to JSON file
# per_atom_references="path/to/references.json"
)
]
}
),
# ... other configs ...
)
Computing Normalization Parameters
Per-Atom References
MatterTune provides a CLI tool to compute per-atom reference values using either linear regression or ridge regression:
python -m mattertune.normalization \
config.json \
energy \
references.json \
--reference-model linear
Arguments:
config.json
: Path to your MatterTune configuration fileenergy
: Name of the property to compute references forreferences.json
: Output path for the computed references--reference-model
: Model type (linear
orridge
)--reference-model-kwargs
: Optional JSON string of kwargs for the regression model
The tool will:
Load your dataset from the config
Fit a linear model to predict property values from atomic compositions
Save the computed per-atom references to the specified JSON file
Composing Multiple Normalizers
You can combine multiple normalizers for a single property. They will be applied in sequence:
config = mt.configs.MatterTunerConfig(
model=mt.configs.JMPBackboneConfig(
# ... other configs ...
normalizers={
"energy": [
# First subtract atomic references
mt.configs.PerAtomReferencingNormalizerConfig(
per_atom_references="references.json"
),
# Then apply mean-std normalization
mt.configs.MeanStdNormalizerConfig(
mean=0.0,
std=1.0
)
]
}
),
# ... other configs ...
)
Technical Details
All normalizers implement the NormalizerModule
protocol which requires:
normalize(value: torch.Tensor, ctx: NormalizationContext) -> torch.Tensor
denormalize(value: torch.Tensor, ctx: NormalizationContext) -> torch.Tensor
The NormalizationContext
provides composition information needed for per-atom normalization:
@dataclass(frozen=True)
class NormalizationContext:
compositions: torch.Tensor # shape: (batch_size, num_elements)
Each row in compositions
represents the element counts for one structure, where the index corresponds to the atomic number (e.g., index 1 for hydrogen).
Implementation Notes
Normalization is applied automatically during training
Loss is computed on normalized values for numerical stability
Predictions are automatically denormalized before metric computation and output
The property predictor and ASE calculator interfaces return denormalized values