ML Katas

Building a Custom `Dataset` and `DataLoader`

medium (<30 mins) dataset dataloader custom data pipeline
this month by E

Create a custom torch.utils.data.Dataset class to load a simple, non-image dataset (e.g., from a CSV file). The __init__ method should read the data, __len__ should return the total number of samples, and __getitem__ should return a sample and its label as PyTorch tensors. Then, use torch.utils.data.DataLoader to create an iterator for batching and shuffling.

Verification: Iterate through the DataLoader for a few batches and print the shapes of the returned tensors (data and labels). The shapes should match your batch size and feature dimensions, confirming correct batching.