Implementing a Masked Language Model

hard (>1 hr) nlp model masked language bert

this year by E

Implement a Masked Language Model (MLM), a technique at the heart of models like BERT. Given a sentence, you'll randomly mask some of the words and then train a model to predict those masked words. This is a self-supervised task. You'll need a simple TransformerEncoder or an RNN as your model. The output should be a logits vector over the vocabulary for each masked position. Use a simple text dataset like a paragraph of text from a book.

Verification: During training, the loss on the masked tokens should decrease. After training, take a test sentence, mask a few words, and have the model predict them. The model's predictions for the masked words should be contextually appropriate, showcasing that it learned to understand the syntax and semantics of the language.