Normalization

MatterTune provides flexible property normalization capabilities through the mattertune.normalization module. Normalization is crucial for improving training stability and convergence when fine-tuning models, especially for properties that can vary widely in scale.

Overview

The normalization system consists of:

  • A NormalizationContext that provides per-batch information needed for normalization

  • Multiple normalizer types that can be composed together

  • CLI tools for computing normalization parameters from datasets

Supported Normalizers

Mean-Standard Deviation Normalization

Normalizes values using mean and standard deviation: (x - mean) / std

config = mt.configs.MatterTunerConfig(
    model=mt.configs.JMPBackboneConfig(
        # ... other configs ...
        normalizers={
            "energy": [
                mt.configs.MeanStdNormalizerConfig(
                    mean=-13.6,  # mean of your property
                    std=2.4      # standard deviation
                )
            ]
        }
    ),
    # ... other configs ...
)

RMS Normalization

Normalizes values by dividing by the root mean square value: x / rms

config = mt.configs.MatterTunerConfig(
    model=mt.configs.JMPBackboneConfig(
        # ... other configs ...
        normalizers={
            "forces": [
                mt.configs.RMSNormalizerConfig(
                    rms=2.5  # RMS value of your property
                )
            ]
        }
    ),
    # ... other configs ...
)

Per-Atom Reference Normalization

Subtracts composition-weighted atomic reference values. This is particularly useful for energy predictions where you want to remove the baseline atomic contributions.

config = mt.configs.MatterTunerConfig(
    model=mt.configs.JMPBackboneConfig(
        # ... other configs ...
        normalizers={
            "energy": [
                mt.configs.PerAtomReferencingNormalizerConfig(
                    # Option 1: Direct dictionary mapping
                    per_atom_references={
                        1: -13.6,  # H
                        8: -2000.0  # O
                    }
                    # Option 2: List indexed by atomic number
                    # per_atom_references=[0.0, -13.6, 0.0, ..., -2000.0]
                    # Option 3: Path to JSON file
                    # per_atom_references="path/to/references.json"
                )
            ]
        }
    ),
    # ... other configs ...
)

Computing Normalization Parameters

Per-Atom References

MatterTune provides a CLI tool to compute per-atom reference values using either linear regression or ridge regression:

python -m mattertune.normalization \
    config.json \
    energy \
    references.json \
    --reference-model linear

Arguments:

  • config.json: Path to your MatterTune configuration file

  • energy: Name of the property to compute references for

  • references.json: Output path for the computed references

  • --reference-model: Model type (linear or ridge)

  • --reference-model-kwargs: Optional JSON string of kwargs for the regression model

The tool will:

  1. Load your dataset from the config

  2. Fit a linear model to predict property values from atomic compositions

  3. Save the computed per-atom references to the specified JSON file

Composing Multiple Normalizers

You can combine multiple normalizers for a single property. They will be applied in sequence:

config = mt.configs.MatterTunerConfig(
    model=mt.configs.JMPBackboneConfig(
        # ... other configs ...
        normalizers={
            "energy": [
                # First subtract atomic references
                mt.configs.PerAtomReferencingNormalizerConfig(
                    per_atom_references="references.json"
                ),
                # Then apply mean-std normalization
                mt.configs.MeanStdNormalizerConfig(
                    mean=0.0,
                    std=1.0
                )
            ]
        }
    ),
    # ... other configs ...
)

Technical Details

All normalizers implement the NormalizerModule protocol which requires:

  • normalize(value: torch.Tensor, ctx: NormalizationContext) -> torch.Tensor

  • denormalize(value: torch.Tensor, ctx: NormalizationContext) -> torch.Tensor

The NormalizationContext provides composition information needed for per-atom normalization:

@dataclass(frozen=True)
class NormalizationContext:
    compositions: torch.Tensor  # shape: (batch_size, num_elements)

Each row in compositions represents the element counts for one structure, where the index corresponds to the atomic number (e.g., index 1 for hydrogen).

Implementation Notes

  • Normalization is applied automatically during training

  • Loss is computed on normalized values for numerical stability

  • Predictions are automatically denormalized before metric computation and output

  • The property predictor and ASE calculator interfaces return denormalized values

mattertune.normalization