[NLP] How does BERT work?
gogl3.github.io › articles › 2021-02Feb 26, 2021 · There are 2 methods for pretraining BERT - 1) Masked Language Model (MLM), 2) Next Sentence Piece (NSP). 1) Masked Language Model (MLM) : For pre-training purposes, BERT randomly masks 15% of the input text that goes into an artificial neural network. And then, it let the artificial neural network predict these masked words.