Tensor Manipulation: Dropout Layer
Description
Implement the dropout layer from scratch. During training, dropout randomly zeroes some of the elements of the input tensor with probability p
. The remaining elements are scaled up by 1 / (1 - p)
to keep the overall scale the same. It is only active during training.
Guidance
- Check if the model is in training mode.
- Create a random binary mask with the same shape as the input tensor, where the probability of a 1 is
1 - p
. - Apply the mask to the input.
- Scale the result.
Starter Code
import torch
def custom_dropout(x, p=0.5, training=True):
if not training or p == 0:
return x
# Create a mask
mask = (torch.rand(x.shape) > p).float().to(x.device)
# Scale and apply the mask
return (x * mask) / (1.0 - p)
Verification
Create a tensor of ones. After applying your dropout function in training
mode, some elements should be zero and others should be 1 / (1 - p)
. The mean of the output tensor should be close to 1. In eval
mode (training=False
), the output should be identical to the input.