mattertune.finetune.loader

Functions

create_dataloader(dataset, has_labels, *, ...)

Classes

DataLoaderKwargs

Keyword arguments for creating a DataLoader.

class mattertune.finetune.loader.DataLoaderKwargs[source]

Keyword arguments for creating a DataLoader.

Parameters:

batch_size – How many samples per batch to load (default: 1).
shuffle – Set to True to have the data reshuffled at every epoch (default: False).
sampler – Defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.
batch_sampler – Like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.
num_workers – How many subprocesses to use for data loading. 0 means that the data will be loaded in the main process (default: 0).
pin_memory – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them.
drop_last – Set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size (default: False).
timeout – If positive, the timeout value for collecting a batch from workers. Should always be non-negative (default: 0).
worker_init_fn – If not None, this will be called on each worker subprocess with the worker id as input, after seeding and before data loading.
multiprocessing_context – If None, the default multiprocessing context of your operating system will be used.
generator – If not None, this RNG will be used by RandomSampler to generate random indexes and multiprocessing to generate base_seed for workers.
prefetch_factor – Number of batches loaded in advance by each worker.
persistent_workers – If True, the data loader will not shut down the worker processes after a dataset has been consumed once.
pin_memory_device – The device to pin_memory to if pin_memory is True.

batch_size: int | None

shuffle: bool | None

sampler: Sampler | Iterable | None

batch_sampler: Sampler[list[int]] | Iterable[list[int]] | None

num_workers: int

pin_memory: bool

drop_last: bool

timeout: float

worker_init_fn: _worker_init_fn_t | None

multiprocessing_context: Any

generator: Any

prefetch_factor: int | None

persistent_workers: bool

pin_memory_device: str

mattertune.finetune.loader.create_dataloader(dataset, has_labels, *, lightning_module, **kwargs)[source]

Parameters:

dataset (Dataset[ase.Atoms])
has_labels (bool)
lightning_module (FinetuneModuleBase[TData, TBatch, TFinetuneModuleConfig])
kwargs (Unpack[DataLoaderKwargs])