Data

PyGrad offers basic data processing APIs that allow for easy creation of Dataset and DataLoader objects.

Dataset

PyGrad Dataset class adds a generic yet useful layer of abstraction that allows it to consume datasets that come in all shapes and forms. Specifically, all that has to be specified is the data and label attribute for each Dataset object. In the context of unsupervised learning, label can be set to None.

Below, we provide an example of creating a custom dataset using sklearn’s make_moons() function.

from sklearn import datasets as skds

class MoonDataset(data.Dataset):
    def __init__(self, num_samples, noise=0.1, *args, **kwargs):
        super(MoonDataset, self).__init__(*args, **kwargs)
        X, y = skds.make_moons(num_samples, noise=noise, shuffle=True)
        self.data = X
        self.label = y

Dataset objects can easily be split up via the ratio_split() function, which can particularly be useful for creating training, validation, and test sets.

from pygrad.data import ratio_split

dataset = MoonDataset()
train_ds, test_ds = ratio_split(dataset, 0.8, 0.2)

DataLoader

PyGrad’s DataLoader class allows basic batching and shuffling functionality to be applied to pygrad.data.Dataset instances.

from pygrad.data import DataLoader

BATCH_SIZE = 8
train_loader = DataLoader(train_ds, BATCH_SIZE)
test_loader = DataLoader(test_ds, BATCH_SIZE)

DataLoader instances can be iterated as follows:

for data, labels in train_loader:
    # training logic here