mattertune.finetune.loader

Functions

create_dataloader(dataset, has_labels, *, ...)

Classes

DataLoaderKwargs

Keyword arguments for creating a DataLoader.

class mattertune.finetune.loader.DataLoaderKwargs[source]

Keyword arguments for creating a DataLoader.

Parameters:
  • batch_size – How many samples per batch to load (default: 1).

  • shuffle – Set to True to have the data reshuffled at every epoch (default: False).

  • sampler – Defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.

  • batch_sampler – Like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.

  • num_workers – How many subprocesses to use for data loading. 0 means that the data will be loaded in the main process (default: 0).

  • pin_memory – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them.

  • drop_last – Set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size (default: False).

  • timeout – If positive, the timeout value for collecting a batch from workers. Should always be non-negative (default: 0).

  • worker_init_fn – If not None, this will be called on each worker subprocess with the worker id as input, after seeding and before data loading.

  • multiprocessing_context – If None, the default multiprocessing context of your operating system will be used.

  • generator – If not None, this RNG will be used by RandomSampler to generate random indexes and multiprocessing to generate base_seed for workers.

  • prefetch_factor – Number of batches loaded in advance by each worker.

  • persistent_workers – If True, the data loader will not shut down the worker processes after a dataset has been consumed once.

  • pin_memory_device – The device to pin_memory to if pin_memory is True.

batch_size: int | None
shuffle: bool | None
sampler: Sampler | Iterable | None
batch_sampler: Sampler[list[int]] | Iterable[list[int]] | None
num_workers: int
pin_memory: bool
drop_last: bool
timeout: float
worker_init_fn: _worker_init_fn_t | None
multiprocessing_context: Any
generator: Any
prefetch_factor: int | None
persistent_workers: bool
pin_memory_device: str
mattertune.finetune.loader.create_dataloader(dataset, has_labels, *, lightning_module, **kwargs)[source]
Parameters: